Easy, Reliable Distributed Storage and Backup?
RichiH writes "Most of you are the free IT staff of friends and family, just as I am. One of my largest headaches is backing up their data. What I am looking for allows for off-site storage on multiple server machines running Linux, has Linux & Windows clients that Just Work and require zero everyday effort (although a large-ish effort to set them up is just fine), allows for granular access control, is versioned and will, ideally, allow me to grab data automagically (think photo pool for your family where your mother, sister, etc., share each other's photos). This is something I've been trying to find for years, but I've never seen anything even closely resembling what I want. With the Wall Street Journal handing out its Technology Innovation Award to Cleversafe recently, I was once again reminded of this particular itch which needs scratching. Before I deploy it, I want to ask the Slashdot community for its opinion on that piece of software, and on potential alternatives. How do you solve this problem?"
Git and the git-web web based tool are very useful for maintaining a tree of archived data, and browsing it.
How about Mozy? I really like Duplicity, but it's probably not for these users if they're asking you for help.
What you are seeking is still in the realms of science fiction, and would probably cost a bomb as well. Good luck with your search, please let us all know if you find this digital nirvana.
Rename your data to 'Barely legal college girls having first time sex - XXX Vol1/256.r001' and use p2p to spread them all over the world!
I can tell you how I solve it in a business context, but whether or not it could be scaled down to personal I'm not sure.
The problem: 2 sites each with 70-100GB of data needs offsite backup with similar criteria to your own. Bandwidth available to these sites is 2-4Mbps. The only OS involved is Linux, though I'm sure Windows could be shoehorned in somehow. A third site which has a tape streamer and someone to take tapes offsite is available. Data protection legislation means that storing it with a hosted service is illegal unless I encrypt it myself before sending it offsite - I'm only aware of one tool which claims to be able to do this and still send data as a binary delta (it uses the rsync library) and that tool is still not particularly common in Linux distributions and not very widely used. I'm nervous of trusting my backups to a tool that isn't on heavy use, particularly if strong encryption is being employed.
The Solution: A server in the third site and some judicious scripting with rsync allows it to mirror the data in the other two sites. The first sync is fairly painful, of course, but provided you don't have too much data regularly changing subsequent syncs aren't too bad. The server is backed up to tape which provides versioning capability so if someone only realises that they lost a file a week after the fact it can still be restored,
Initial effort to set up was pretty great but now it's done it JFW and requires no brain power whatsoever to run on a daily basis. I can make the data available over the VPN (of course the access speed will be dog slow) more-or-less immediately and I can make it available at LAN speed by copying it to a hard disk and courier it to the remote office in under 48 hours. A full restore of 100GB across a 2Mbps connection will take at least 4-5 days.
You're asking two questions. The first is that you want backup, so that all their data just gets thrown somewhere and they lose the last few days' work their hard drive dies. You don't even necessarily want this on the network; just back up to a DVD-R every so often, and take every month's DVD-R offsite (a friend's house, a bank's vault, whatever). There's lots of backup software for this. Most can do fancy stuff like incremental backups. You can probably find something opensource you can host for your friends and family on a decently-available server.
The second question is networked file storage, where you don't care about automatically archiving files, but you do want frequent access and a good UI. For this I recommend something like Dropbox, which has good support for OS integration and a web interface.
Have a look at http://allmydata.org/trac/tahoe which might provide what you're looking while being way simpler to setup than Cleversafe.
Ars technica did a nice review of Dropbox, titled, "How Dropbox ended my search for seamless sync on Linux" (but it works on OSX 7 Windows too) http://arstechnica.com/news.ars/post/20080914-how-dropbox-ended-my-search-for-seamless-sync-on-linux.html
what's wrong with getting an account with Connected/Iron Mountain - easy to use intelligent online storage that doesn't cost a lot - saved my bacon many a time
Have you considered the JungleDisk client that works with the Amazon S3 storage cloud? This has backup clients for Windows, Linux, and Mac and with suitable configuration of 'buckets' would allow you to do most of what you are trying to achieve. Okay so it's a pay-for service (albeit cheap) but it does provide the all important off-siting, strong security/encryption and unlimited capacity.
"Only wimps use backup. Real men just upload their important stuff on ftp, and let the rest of the world mirror it."
God
I looked at Cleversafe, trying to get through the PR bubblespeak. It seems they are emulating disks, not offering integrated _backup_. As saving from my mom's SD card to a distributed online disk via a DSL line is not feasible, I will most likely need to scratch that idea.
Backup isn't the same as sharing. And do you want actual replication or merely fault tolerance to node failure? Actual n-fold replication means you're going to pay n times the amount of money for storage. And why do you insist on one application to do everything?
My suggestion: set up automatic backups to one of the many backup services on the net. They worry about how to replicate your data, you don't have to. For the same service to support both backup and sharing is hard and it's probably a bad idea. It's much easier if you know that the backup service simply cannot access the contents of any of your files.
For sharing, use services designed for that: Flickr Pro, Picasa, Google Docs, whatever. They are designed for sharing, they know about users and permissions, and they can only publish what you actually upload to them.
As for Cleversafe, the idea is as old as forward error correction, but the economics and management never seem to quite work out. And basically, you're getting the same functionality from hosted storage: Amazon, Google, Box.NET, etc. are already figuring out how to keep your data available and secure, and are probably doing a better job than you could do with a homebrew system.
No Linux client, AFAIK (though I do run it on my MBP). It's become rather impractical for me as a photographer though, as sometimes I'll shoot enough photos that my internet connection would be completely maxed out for days on end trying to sync up the new data - and I have a decent-for-cable 1Mbps upload rate.
rsync to Amazon S3 might be an option, if only for cross-platform capabilities. No versioning though, but outside of Apple's Time Machine (obviously useless for Windows and Linux), you're not going to get that without some major headache. Any remote system is going to be horribly slow for the first sync with any typical internet connection, and quite possibly problematically slow for photographers, media horaders, and in general people with big hard drives.
How are sites slashdotted when nobody reads TFAs?
The subject says it all:
- rdiff-backup to backup your data one backup server.
- chironfs to clone the file system to another remote server.
rdiff-backup runs on *nix and windows (with the help of Cygwin).
Once set up, rdiff-backup needs virtually no maintenance. If needed, setup Nagios to warn you if things run afoul.
Used this for years, never disappointed me so far!
If you had only Windows and Mac, I'd opt for Mozy (http://www.mozy.com) which is owned by EMC. It's $50/year for unlimited storage and their agent is unobtrusive and backs up even open files.
The downside is that it limits upstream bandwidth to 1Mb/s, so your initial backup might take a week. But after that, it takes 3 minutes a night and it does it without prompting. I've strong-armed my immediate family into using it because it also allows me to monitor remotely the status of all backups.
It's seriously good stuff.
You were mistaken. Which is odd, since memory shouldn't be a problem for you
As an alternative, use tape. It may not have the shine of offsite backups, but if you need data backed up reliably, one easy option might be a DLT or another recent capacity tape drive. Combine a backup program that does encryption (bru, amanda, zmanda) and then set up a contract with Iron Mountain.
Then, if you do a basic tape rotation schedule, periodically running recent tapes offsite, you should be protected against known disasters. And, because the tapes are encrypted with a high quality and long passphrase (this is assuming), if the tapes get lost or stolen, they won't do an attacker any good.
On the low end, if tape is too expensive, there is purchasing external mini hard disks that only require power via the USB ports, combining those with TrueCrypt or another sturdy encryption program and using those instead of tapes.
Get 4 x 1TB disk and minimum RAID 6. Install Linux. Install rsnapshot, which offers:
* Filesystem snapshot - for local or remote systems.
* Database backup - MySQL backup
* Secure - Traffic between remote backup server is always encrypted using openssh
* Full backup - plus incrementals
* Easy to restore - Files can restored by the users who own them, without the root user getting involved.
* Automated backup - Runs in background via cron.
* Bandwidth friendly - rsync used to save bandwidth
You may also find CentOS or Debian tutorial useful.
Good luck!
You're merely proposing data replication again.
TFA mentioned Cleversafe, and specifically wanted feedback on that "dispersal" approach as it seems better than simple replication.
If I understood TFA correctly, he proposes to develop (although he wrote "deploy") something similar, hence wants feedback on the idea first.
http://www.bacula.org/
Runs pretty tight (low bandwidth), supports channel encryption and datastore encryption, can even create Bare Metal Recovery disks. I have a server room with LTO3 tape drives that I use to backup my clients' incremental data changes nightly, including Linux, Mac and Windows clients and servers. I have VPN's out to each client, so don't use the built-in channel encryption, but I maintain a keypair for each client.
Backup only, but I /could/ present a maintained volume as a share over the VPN. Bacula supports disk and tape volumes as backup stores. I've personally had no need to do that to date.
We're not talking terabytes here - my ISP would pwn me if that was going on, but I do circa 20G of data changes every night from clients. Some of them are laptops that are not always on or connected. Most are friends and family PC's, so it backs up when it can. I have to do almost no maintenance apart from changing a tape occasionally. The backup client is tiny and unobtrusive, even when running. On Windows it uses VSS, so it is reliable.
I have had a number of panic phone calls (esp from my kids at Uni) who have lost a thesis or the like and are utterly amazed when, after a few clicks over the phone they look at their webmail and yesterday's version is in their inbox. That's what it's all about! I am the god of lost data! Which, of course, works for me.
There are a bunch of people offering this sort of service (or build your own) on Amazon's S3. It has the advantage of being accessible to everyone, has the security built in and you only have to worry about the data not server availability.
Backup not on the cloud just doesn't make much sense to me these days.
An Eye for an Eye will make the whole world blind - Gandhi
I think this may have been said before, but what's wrong with setting up a basic samba server on one of your machines, and then simply using cron (Mac/Linux) or scheduled task(Windows) to dump the backups across the WAN via rsync/scp? (Depending on how important managing multiple versions of the same file is, perhaps using cron on the server-side do some SVN magic would make sense.)
You'd be able to allow multiple users access to other folders with simple Samba ACLs, and it'd all be right on their desktop with an interface they're used to. As far as maintaining the backend, simple rsyncs between your linux servers keep everything up to date in the case of node failure.
The solution also allows for easy drop-in replacement in terms of switching target backup/sharing servers. All you would need to do is email your new rsync script and tell LuserX to put it in XYZ directory. There'd be no walking them through configuring a new backup application they've probably never seen before.
Clean and simple, but certainly not feature-rich.
RichiH obviously is more a Stallman guy asking for a diy and possibly opensource solution, that kept all his family data on privately owned systems...
so why all the jungle/dropbox/flickr answers?
has slashdot sunken that deep?
what i do:
- a fileserver with raid5 at my place and one at my parents'
- nightly rsync replication of their data to my server and my data to theirs over ssh
- ... so we always have two copies of the data and local redundancy
- allows fast access to all data even huge amounts in the hundreds of gigs dimension and also if internet is somehow slow or down
- this of course is no backup in the classic meaning! however rsync does not delete data on the replication site if you don't tell it too, so you kind of have protection against mistakenly deleted data too (unlike with only raid)
For backups, I'd recommend one of the many online services. I used to do backups using a custom shell script, but you really can't beat the online services in terms of ease-of-use. Personally, I use Mozy. It's $55 a year for unlimited storage, but they offer 2GB free (and for a lot of folks, that's really all they need). If you have a lot of computers, you can set up a single account to manage all of them.
As for file sharing, if all you're doing is sharing photos, I'd recommend a site like flickr. For other stuff, dropbox seems to work well.
For storing permissions and the such, are you using a .tar container? My biggest stumbling block with my backup scheme is storing ACLs and permissions.
I've got a few ideas about doing it, but they're all kludgy or force me to walk away from my rsync scripts which are really fairly mature at this point. Furthermore, I need to get deltas downstream and packing everything in to one file pretty much defeats that purpose at the several gig level unless I'm running an rsync server to calculate the diffs. These kinds of things become problematic due to the infrastructure I'm working with.
I'm really starting to lean towards running everything over iSCSI, but then I've got to get the VPN thing going which could require some re-subnetting at either end of the tunnel. Needless to say, I'd prefer to avoid that or any other solution that requires messing with stuff that Works Right Now.
Have you dealt with these issues at all, or at least know what won't work? I'd appreciate any insights before I use a brute force method.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Check out S3Backer. It lets you mount an Amazon S3 bucket to your Linux/Mac/BSD/*NIX box. GPL F/OSS as icing on the cake.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
As well as all of the standard things you'd expect from a networked filesystem (ACLs, authentication, and so on).
If you set up an AFS cell with your volumes replicated across a few remote servers and get your clients to connect to this cell then it should be fine. Set a cron job to take regular snapshots, and dump them to some offline medium periodically.
I am TheRaven on Soylent News
I found and tested the predecessor of the following device (which I can recommend on basis of a year-long test of a sample with N=1): Bubba (see http://excito.com/bubba/about-bubba.html ). A Swedish NAS device. I have to note that it's certainly not "distributed" in the sense that it's easy to mirror data across multiple devices (I didn't try and wouldn't know an easy way of doing that). It's basically a server, so you'd still need to take care of backups yourself.
It's a metal box the size of a lunch-box, contains a HDD, a PowerPC processor, two ethernet interfaces, and comes pre-loaded with Linux 2.6 (Debian Etch), and has a web-based control interface for adding users (see http://excito.com/bubba/about-bubba.html ). It can act as a server (Samba), torrent and email downloader, and router (if you want). It's got decent tech support through this forum (see http://forum.excito.net/). You can buy the box with or without HDD.
Nevermind the website (they brought in a consultant who made something I really dislike), the box and its applications are solid. Have a look and see if it's what you need.
Not sure if I understood your request correctly but check out Wuala. Great for storing and sharing information in a secure manner over the internet.
First set each computer up with a dyndns account so that remote administration is easy.
Then set up folders in each computer for each member of the family. For each family member's main computer, make symbolic links to other family members picture folder, etc.
Set up a schedule to use rsync to copy the contents of the folders on a daily basis.
While you are at it, I suggest adding one more computer to the mix that will copy the home folders for all family members and keep them in a svn folder so they can call you to undelete files.
Help! I'm a slashdot refugee.
Try JungleDisk http://www.jungledisk.com/ It uses Amanzon S3 Storage.
What about Mozy? http://mozy.com/ 2GB for free, or $5 a month for unlimited storage. Does versioning and is really easy to use.
I use JungleDisk to backup everything to Amazon S3.
BackupPC might do what you're after. From the blurb:
high-performance, enterprise-grade system for backing up PCs
BackupPC is disk based and not tape based. This particularity allows
features not found in any other backup solution:
* Clever pooling scheme minimizes disk storage and disk I/O.
Identical files across multiple backups of the same or different PC are
stored only once (using hard links), resulting in substantial savings
in disk storage and disk writes.
* Optional compression provides additional reductions in storage.
CPU impact of compression is low since only new files (those not already
in the pool) need to be compressed.
* A powerful http/cgi user interface allows administrators to view log files,
configuration, current status and allows users to initiate and cancel
backups and browse and restore files from backups very quickly.
* No client-side software is needed. On WinXX the smb protocol is used.
On linux or unix clients, rsync or tar (over ssh/rsh/nfs) can be used
* Flexible restore options. Single files can be downloaded from any backup
directly from the CGI interface. Zip or Tar archives for selected files
or directories can also be downloaded from the CGI interface.
* BackupPC supports mobile environments where laptops are only intermittently
connected to the network and have dynamic IP addresses (DHCP).
* Flexible configuration parameters allow multiple backups to be performed
in parallel.
* and more to discover in the manual...
"I think it would be a good idea" Gandhi, on Western Civilisation
It's simple. It works.
Believing something doesn't make it true. Not believing something doesn't make it false.
I wrote the open source backup tool "Gazoo!" to perform fast reliable backups using rsync. Think of it as a command-line TimeMachine. ;-)
Give it a whirl! Never loose a file again! :-D
www.jungledisk.com - does all you want, its cheap and its hosted on Amazon's multiple servers.
I'll see your hokum and raise you a boondoggle.
cough ... JungleDisk ... cough
I think that the issue is faced by far more people than is readily apparent... it's the need for a VERY easy to use tool to share Our Stuff with Our Family. If my Mom and sisters were able to share all their photos with each other by carrying a USB drive around when they see each other... the most important thing they have on their computers would be backed up... the need for social file sharing is huge... we just don't have the tools to do it well yet. Something that does auto-discovery of stuff, remembers previous decisions, and just goes to work making copies in the right directions is what we need.
I use Spideroak ( https://spideroak.com/ ). They do backup and sharing like you need.
I just didn't want to deal with it. I use cloudbackup.openrsm.com and have them buy an account. It can do a whole network of Linux, MAC, and Windows machines with one account, or just a laptop. The client software is free and does network drive of the backup space too. I figure easy and my friends paying for it works. It's saved my butt a couple times too.
AhSay's free version of their Offsite Backup Server (http://www.ahsay.com/en/freeedition/ahsay_free_edition_index.html) does versioning and, well, everything you're really asking for. I use this at work with about 20 clients, and it's rock solid.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
JungleDisk looks good.
I have no idea what type of cost that's going to incur after a while though. It's probably going to be like an additional cell phone bill per month, which I don't know is feasible.
Another option I kind of thought of--can you buy a server from a datacenter at a one time fee, put in as much storage as you want, and then just get charged for bandwidth? The up front costs of that would be fairly high, but it seems like it would be the cheapest long run cost, IF you could find some place willing to do it.
I do this commercially. I ship a small embedded box with custom firmware that works as a samba client and runs a VPN back to my server.
Then I run rsnapshot to rsync the remote.
That way the clients don't have to do any installation at all, I can admin my box remotely without any local representation, and it will work with any system as long as it supports samba.
The only setup required on the local site is a userid for the backup client.
The devil, of course, is in the details.
Google it for more info, it will do everything you are asking for in a secure manner.
Living in Chile
It supports rsync, ssh, tar, and SMB. Performs pooling which reduces the number of stored files. Only issue is it uses the local account password file, so you'd have to set up an account for each user you wanted to give direct access too. http://backuppc.sourceforge.net/
I use SVN to backup my sister's important stuff to my home server. It was easy to teach them to commit changes and add new files to be versioned because I installed Tortoise SVN on their Windows computers. It has full versioning and can use an encrypted link if that's important.
Everything else just seemed like too much work to implement.
You've not purchased a Mac in the last five years.
I run a company called Real Pro Data Solutions, LLC.
You can check us out at - http://www.realprodata.com/
We run on open standards and can provide assistance with setup. We have solutions that will work on all of your platforms, Windows, Linux, Mac...
We're a small business so you can always work with the same people who helped build the company!
And you can actually call and talk to us! :-)
My company also struggled with an easy to use and robust offsite backup solution that backed up Windows servers with exchange along with our Linux and Unix servers running MySQL and other services. Backups were always a large headache for us and our customers. About two years ago we started testing multiple backup systems and running large scale recovery scenarios for our company and our customers. We needed it to be dead simple to use for both customers and administrators, offer revision control, encrypt the data, compress the data, and be able to store the data for up to 7 years or longer. In our quest to find and implement the ultimate solution we started our own company offering offsite backup, and we had over two years of testing and breaking many other solutions. Now we are the company our customers look towards to ease the stress of their backups both large and small. http://datumguard.com/
rsync to Amazon S3 might be an option, if only for cross-platform capabilities. No versioning though, but outside of Apple's Time Machine (obviously useless for Windows and Linux), you're not going to get that without some major headache.
Server running opensolaris/*bsd with ZFS, rsync to that, create a snapshot every day.
Don't most businesses already do this? On laptops, I used roaming profiles, and synched My Docs with the user's home directory on the server. All additional backups, versioning, etc. were handled on, and by the server.
Downside is it's not a complete solution, as any data stored in Program Files or Common Files dirs wasn't mirrored.
Upside is that it's simple network management, and even lets you use login scripts.
I don't think you're ever going to find a 'simple' (as in 3 clicks) solution usable by non-techies with versioning. Backup, yes. Sharing, versioning....not so much. It looks like you're simply going to have to be the server admin, and let the server deal with the versioning, multiple sites, and sharing.
I've been using Mozy.com for back ups. The client is totally unnoticeable after it's installed. It just runs some time each day when you are not using your PC. Handles large files like .pst nicely as well.
They give you 2GB free or unlimited for 5USD/month and bonus space if you refer people :)
https://mozy.com/?code=WAQ9DM/ and scroll down to 2GB free offer.
What I personally use is all linux so it may not be much use to you however, i've gone thru several iterations of a simple script for my machines.
Originally it was just a usb connected drive at home and work that got rsynced to (it would look for an lvm volume with specific name and if it existed, kick it off). The first iteration of the script basically just rsynced the data across and once a week did a dump. The second iteration took lvm snapshots on top of that (with some minor automated management) and added rsync -d to the mix (deletes files on the target that dont exist on the source).
The next iteration was a much bigger change (and probably the smarter one) so basically it would rsync and snapshot files that didnt come from (or had changed from) packages that existed on the machine. i.e. it would go thru all the files on my harddrive, if it was from a package it would leave it alone (getting the OS back is simple then just overlay the backup stuff on top). a little while ago i switch to zfs for the external drive (my only real regret is zfs will probably never be a part of the linux kernel) and thats been pretty good cause zfs is a brilliant filesystem.
In the OSS space i've played with things like afs, coda, drbd and things that wrap around svn and cvs (how I wish lvm had replication in built).
But, having worked in the big-boy space for a long time i've seen alot of commercially available implementations most of which are available cross platform. Some are based on backup solutions (netbackup, backupexec, backbone, etc) and some work at the storage level.
One implementation I was mostly impressed with though was using falconstor. Its basically a block-level replication software that connects to iscsi volumes and is really quite impressive in the way it manages backing up data. The company itself had mostly rover types who were in the office maybe once a week, the rest of the time their volumes would sync across links of varying speed (even over vpn) and was able to be configured not to chew up the entire link space. It all spoke back to an iscsi storage device (equalogic or emc with its iscsi head, i cant remember) and was also snapshotted occasionally. The best part about it was they never really had to deal with it, it just seemed to work 99.99% of the time.
So far i've found the OSS side a little friendlier to those who know how to use them (mostly because they're just so much easier to modify), while the commercial side do everything you expect them to with varying degrees of success without being flexible enough.
Of course, the one definitive thing i've found with people is that you'll always find someone who'll end up saving data in odd locations then get cranky when you cant restore it because you just weren't backing up c:\windows\temp ;).
The whole "you can please some of the people some of the time" holds true 99% of the time.
Datto's new line uses ZFS with snapshots. If they're willing to spend a couple hundred bucks, it's a really easy (and foolproof) solution.
The current state of open source backup technology is abysmal. Currently, I'd say reliable would by rsyncing to a large, removeable hard drive, and then couriering it to a remote location or "secure" physical storage service.
For "long term" backup, get a DLT tape drive, and selectively backup to tape. The tape, if properly stored, will be more likely to recover data than a hard drive. Also note, this is a few hundred dollar investment, with large capacity DLT tapes going for a hundred a pop as well.
=====
As anyone who's experienced this can attest, DVD-R/+R media really sucks for long term data storage. (We're talking 5+ year ranges.) But the latest error correction technologies (parchive2) has got me rethinking the problem.
parchive2, for those not familiar with it, uses reed-solomon ECC algorithm to produce checksum files against a datafile (or set of datafiles). Its conceptually similar to raid5 data storage. So even with an inevitable failure of a sector of DVD-dye, you can still recover the data intact after running a reconstruction. A beautiful demonstration of a datafile's recoverability, with even 15% loss of data, would be to download a DVD (or even bluray disk) off USENET NEWS, delete a few data chunks, and then run par2 to rebuild the lost chunks. (And unrar to restore the original DVD disk.)
The general idea is that you process the directory you want backed up using the RAR program (which "archives" it into a dump file, and then chops up the dump file into even pieces), create enough par2 data to allow a recovery with 20% data loss, and then split the data files and the par2 ECC files over 6+ DVD disks. (That would allow a recovery, even if you lost a disc, and lose a few sectors on another.)
One "trivial" problem would be to write up a convenient utility to automate this whole process. The difficult problem, as I see it, is to increase data survivability when the DVD header block is lost, and the whole DVD disk becomes unreadable. I suspect there are alternate forms of DVD data encoding which allows one to either retrieve an alternate header block, or do sector recovery of disc data.
A payoff here would be awesome; one wouldn't need to split a backup over many DVD disks, and still ensure data recovery, say, 10 years from now. So I will have to rip into hardware DVD ISOs to get an answer, but if someone has a better esoteric knowledge of DVD technology, any help would be much appreciated.
There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
I'm surprised no one has mentioned Wuala - www.wua.la - which is a distributed online storage system. You agree to store (encrypted) bits of others' files in exchange for the ability to do so on others' machines across the wuala network. It's free and pretty damn cool. They can explain it better than I can: http://wua.la/en/learn/why
wuala works great for me. It's free, distributed, encrypted, and you can have as much space as you want as long as you share a corresponding portion of your hardrive (you get 20gigs online if you share 20 gigs of your hdd * the time percentage your machine is online). It has clients for Windows and Linux, even Mac. You can keep all your files private or decide you want to share a few folders either with friends, or everybody http://wua.la/
Hmmm. One great solution: http://www.rsync.net/
Their blurb:
"Business continuity and disaster recovery built on open standards and common sense.
- Simple backup/restore for Windows/Mac/Unix
- Encrypted backups, snapshots, IPV6, sshFS
- rsync, ftp/sftp/scp, rdiff-backup, WebDAV...
- Map as a drive letter or mount in the Finder
- Subversion/MySQL/Postgres/Exchange/SQL
Choose the only provider with geographic redundancy around the world.
Standard Offsite Filesystem: Full featured account, data stored in one backup location. $1.60GB/mo
Geo-Redundant Filesystem: Data is automatically replicated to a second, redundant location. $2.80GB/mo "
This is just advertising spam taken to the next level.
If you was bright you'd notice you don't control distributed backups.
Not having control of your own data is double plus ungood.
I.E. This is just someone shilling for a company.
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
Either you have approximately three libraries of congress worth of data, or a very cheap cell phone bill. S3 storage is pretty cheap considering the redundancy and offsite and all that good stuff - 15c/GB-month, and 10c/GB for transfer in. So up to about 30GB of so worth of stored data, it's cheaper than Mozy ($5/mo), but I'd need to be storing over 400GB-month of data plus a good chunk of rsync transfer bandwidth before it would cost as much as my cell line.
And given the cell reception I get in S. Bumfuckville, NH, it'd probably be a much better investment.
Most servers are going to have some sort of monthly charge associated with them. A co-lo will charge you out the wazoo for this kind of thing - remember, rack space is expensive. Cheap hosts that offer plenty of oversold storage (they tend to be more concerned about CPU cycles though, so using it as a remote drive probably wouldn't be an issue) don't typically give you SSH access, and anything beyond a cheap host and you might as well just go for S3/jungledisk unless you have a tremendous shit-ton of data in which case just buy a couple of Drobos and befriend a UPS guy and a relative in a different state. With 1.5TB drives hitting the market, you can pack about 4TB of drive-failure-safe storage in each unit. But honestly if you have those kinds of redundant+offsite storage needs, you're probably not asking Slashdot how to achieve it.
How are sites slashdotted when nobody reads TFAs?
ObStdDisc: I work for the company I mention here... but suffice it to say that I left a very stable job to do so - so's to indicate that I do actually believe in the excellence of the product.
Keep an eye on Rebit. It doesn't do what you're asking about as of this moment... but (without treading into realms of "I'm not allowed to talk about that") I can safely say that the future holds some interesting things along this sort of direction.
Sig broken, watch for
restore-backup.com
rsync to Amazon S3 might be an option, if only for cross-platform capabilities. No versioning though, but outside of Apple's Time Machine (obviously useless for Windows and Linux), you're not going to get that without some major headache
Um, there are plenty of incremental backup tools dotted about, just upload the dumps?
Alternatively tarsnap is currently in beta testing, uses Amazon S3, and the client is written by the top FreeBSD security bod, with the client coming as source (though the service isn't free).
I'm testing it out right now for porn backup. So far, so good.
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
I'm not sure if it's precisely what you were looking for, but Allmydata Tahoe looks like an interesting possibility. It sounds like it meets most of your criteria, except versioning, but I think it exposes a FUSE interface, so you could probably just run a versioning filesystem like Git atop it.
unison is like rsync, but _mirrors_ changes on both sides, so you could create a star-type file distribution network based on a single backup location. So, anytime people drop a picture into their 'public pictures' folder, it will eventually get synced to everyone's 'public pictures' folders. uses rsync under the covers so is efficient. Good documentation; didn't take too long for me to get it set up right on a mixed windows/linux/mac environment. http://www.cis.upenn.edu/~bcpierce/unison/
First of all, thanks for all the feedback. I appreciate it. :)
It seems that I did not state my requirement well enough and there's some confusion about them. So, here goes again:
* The clients are distributed all over Germany, sometimes all over the world. DSL or better is available at all sites.
* Data synchronization must be asynchronous.
* Clients must have local shares, slices, working copies, whatever. They must work offline, as well.
* Data must be partitionable. While my sister and mother share all their pics, others do not.
* Automagic sync as soon as network connectivity is available is a plus. Requiring them to click a button is fine, as well.
* Almost all clients will be Windows.
* Total amount of storage used is a few dozen GiB. Using a few times that amount on the servers is fine.
* It's OK if brains need to be poured over this baby during initial setup.
* It's _not_ OK if tech knowledge is required during day-to-day operations.
* I will not be on site very often, depending on how far away they live.
* It's OK if they are not able to restore anything by themselves. I will just grad whatever and send it per eMail.
* I don't like S3 or other cloud implementation. That data is mine/theirs and not storing it elsewhere is better than encryption. Hard drives are too large, anyway. Storage space is not a concern, reproductions of the complete data make it fault tolerant.
Once again, thanks for all feedback :)
Sounds like tahoe could be what you want. However, it's pretty young, so I wouldn't necessarily rely on it yet.
SpiderOak https://spideroak.com/engineering_matters is a fault tolerant, fully encrypted remote backup, and supported Linux since day 1. It does block level dedupe, and preserves all historical versions, deleted items, etc. I use it to combine my backup of multiple machines into one archive. Only real drawback is it uses more CPU while archiving items than other archiving systems.
When I saw the words "easy", "reliable" and "distributed", I was expecting the punchline to be "choose any two".
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
I use carbonite. Small app, I can have multiple machines within the same account, unlimited data for something like $49/year. I got it for a work machine - and it has already been used to retrieve deleted files (very painless process), liked it so much that I got it for a couple of the family machines that I support. I set it up for them and the only instructions they have to remember is "don't save tax returns under c:\windows\system32, save them under My Documents".
If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
I watched their CTO's Google Talks presentation and it was really interesting. I got all excited, joined their beta only to realize that they - IMO - misused the technology they had and designed a rather mediocre product. Wuala wants to be a backup tool, a sharing tool, a social networking medium as well as few other things. In other words it lacks focus and wants to do everything - an approach that rarely works.
3.243F6A8885A308D313
http://www.openafs.org/
http://www.crashplan.net/ has done exactly what you describe. everyone in your 'backup network' backs up to each other, and for free. They make money from selling their own offsite backup. --Sam
Personally I use NovaBackup 10 from http://www.novastor.com/. I just the DR image capability in it to go to my 'backup server' (NAS), as I found it more reliable than ghost or true image. Though I did test the Amazon S3 and FTP parts of it along with burning to dvds, BD backups are still too expensive for my taste. Though it does do tape and other stuff, but isn't tape dead? :)
CrashPlan is an excellent option: supports Windows, OS X, and Linux, backups to other computers you own or trust (or even to CrashPlan servers, if you prefer), everything is encrypted, responsive dev team, etc, etc. Check out the video.
I've been using CrashPlan on my OS X machines for quite awhile, and it's been working nicely. It's not free (for the person that wants to do a backup), but they do have Windows, Mac and Linux clients. And it's completely free to run on machines that will only run as backup destinations.
It can't do your photopoll stuff, as far as I know, but the rest seems fine.
Make them all use Macs and use TimeMachine. flawless.
+1 vote for JungleDisk. I use it on my Windows and *nix machines and couldn't be happier. I really like the idea of paying for the software once (use on as many machines as you like, with free upgrades forever) and paying Amazon for storage "at cost." So many other internet services rely on oversubscribing limited resources, with heavy users eventually getting ejected in favor of more profitable clientele. With Amazon, I know I'm getting exactly what I pay for and they're not going to disappear with my data anytime soon.
Run an iSCSI target for each person you want to store info for. They can connect with their windowz and to them it acts like a slow hd. If you need security add vpn.
I have heard about this site and it looks darn easy to use:
http://www.crashplan.com/
They support Win, MacOS and Linux
If you have a "buddy" to store your backups then it'll be free otherwise to store stuff on their servers they charge a fee for it.
Darkk
S. Bumfuckville, NH? That must be near Jaffrey.
http://trac.manent-backup.com/ Easy: yes, after a first setup. Reliable - yes. Versioned: you bet! Actually, every backup you do is accessible as a different version, with a very little overhead.
A low power "set top" linux box; 1 lan, 1 wifi, 4 USB, no internal storage; plug in external usb drives. Tv out, usb audio & video in.
Serving; Email, web, files, printers, p2p, music & videos, video calling.
easy VPN between trusted boxes
easy sharing files (rsync over vpn)
easy sharing calendar & addressbook(with outlook, thunderbird integration).
The key is an easy and secure way to set up trusted vpns between multiple set top home servers to form friend & family networks. Perhaps email an URL "invite".
Deleted
STFU, Twitter.
www.boxbackup.org
Encrypted online backup to a server you control.
(Repurposing a post I made to the VMWare Fusion forum...)
I've found CrashPlan ($25/seat) to do a pretty good job of cross-platform, Time-Machine-like, peer-to-peer backup between Mac, Windows, and Linux servers - with the added advantage of off-site backups (for a fee, from them, or for free, from your own machines and your friends - who don't have to buy CrashPlan, either).
On the Mac, like Time Machine, it appears to use the FSEvents system to back up only the changed files. (On Linux, it uses inotify, but that has some bugs; on Windows, I think it may use Shadow Volume Copy or something like that.) It stores only the portions of the files that have changed, in an xdelta-like format, so it's highly compressed and deduplicated (kinda like git). I sprang for the Pro version, which at $60 can keep any number of previous versions for any number of days, so you've really got point-in-time restore as far back as you want it. Best: It regularly checks the integrity of the backups. Anyone who's tried to do tape backups has discovered the joy of a corrupted backup file.
Downsides: It's a CPU hog, even on Apple's 64-bit Java 6 VM. You can set it to limit its own CPU when you're at the keyboard, but obviously, that slows down your backups, and it doesn't seem entirely accurate; on an 8-core Mac Pro, I've seen it use up 100% of a core even when it was theoretically limited to less than that. There's an upgrade coming in the next few weeks that's supposed to offer 400% faster backups with 30% CPU, so that may get better.
Also, the UI for restoring is a bit clunky, and forces you to go date-first, rather than tree-first; if you know you need an older version of a file, but don't know what the last "known good" version was, you're in for a lot of mousing. It has had a number of bugs (fewer lately) that cause it to lose track of which files have actually changed. This doesn't cause any problems, since your backup peer will store only the changed bytes (=0 bytes), but it does make the CPU problems worse, and waste a lot of disk and network bandwidth.
There are free automatic updates every few months, but they're forced and unannounced, which gives me the willies a bit. (I don't know if the enterprise version has more control over that. BTW, the enterprise version is named "Pro Server", not to be confused with the home "Pro" version I bought, which of course has a server component as well..) Also, although your backups are encrypted, the logs (which are apparently either sent to, or retrievable by, their support team) have your filename and pathnames in them, which is a pretty big privacy leak that I've alerted them to.
That said, having once done a complete tour of EVERY Windows backup solution, from free to $10K, and finding them all pathetically lacking and buggy, and nonetheless having bought my own DDS-4 drive and, later, VXA-2 10-tape carousel, and nonetheless still having had to send drives off to OnTrack three or four times... CrashPlan is the best damn backup I've seen, and the only one that's been hands-free enough to use and rely on, and for under $100 it's crazy.
Even though I've never used it, I can say that from years of reading the K12LTSP listserve, BackupPC has always been mentioned as a good way to backup multiple machines. It supports Windows and Linux and you can browse the archives via a web browser.
http://www.getdropbox.com/ - Store, Sync and share your files - versioning is standard. Piss easy to set up on Windows, Linux and Mac.
Not much more to say :)
have a look at www.datacastlecorp.com as their backup service works well for friends and family.
(Assuming windows clients)
I use "Super Flexible file Synchronizer" (www.suplerflexible.com) and schedule it for nightly uploads to my S3 account. It runs as a service and has more options then you can shack a stick at, it really is super flexible.
I run it on my server to back up about 600mb of databases and websites every night. I keep two weeks of copies on S3 and my monthly bill from Amazon is about $1.25. That is not a typo, it's a little over a dollar a month for no fuss backups.
Box.Net (they have 2 million users) has everything that you listed above, except desktop clients, which they claim are under development.
You've got online bacon storage? Wow! The wonders of the internet never cease.
http://sonicwall.com/us/products/2057.html
CDP detects new or changed files, even when files are open. When this information is found, CDP immediately and automatically replicates it to dedicated hardware locally. Unlike most traditional backup products, no user intervention or additional software or hardware is required with CDP.
It also keeps track of 15 versions of a file, so you can restore any or all of them.
Organization: alphabetical, sometimes numerical or messy
Right here. Good stuff.
I use JungleDisk and the Amazon S3 service. This solution does everything you're asking for. It doesn't cost much, and you take on the S3 costs.
Rsync can be versioned just fine. You tell it to rename the old version of foo.jpg to foo.jpg.back. After the backup runs, another script runs on the server that pulls the mtime of foo.jpg.back and uses that to rename it to foo.timestamp.jpg.
I implemented this in my last job where home directories were served by samba. The clients were winsooze. I had a large list of directories from the users' profile directory that were not backed up, because they changed too often, and few users would be aware of their existence. (browser caches, doc temp files...)
Anyway the backup folder for a user would typically be 2-3 times the size of their primary directory over the course of a year.
This size was maintained by another script that pruned the backups. The weekly prune removed all but the last version of the previous week. The monthly prune removed all but the last version of the previous month. The quarterly prune removed all but the last version of the previous quarter.
So at any given time, you usually had dailies for this week and last week, weeklies for the current and previous month. Monthlies for the current and previous quarter.
Implementation:
Assuming you have shell access on a unix server somewhere that you have space, then you need to establish a communication channel between the client and the server.
You need to put ssh and rsync on the client, and either run rsyncd as a service on the client, and pull from the server, or some form of cron on the client and push from the client.
If you have samba running on the server, you have a browse mechanism for users to restore files. Store the server's location as one of the user's "Network Places"
Of course all of this is run either through some form of VPN or each channel runs through ssh.
Caveats: As with many backup systems, this doesn't handle changes in the file system in a fully robust manner. There will be errors if the user moves or renames a file between the script starting and the file being backed up. Since this system keeps multiple versions, and since backups are handled at the file level, the occasional single bad file or missed directory is not the end of the world -- it just means that a less recent version is used.
It also doesn't handle huge files such as large MS Access file or Photoshop files with complete grace due to their size. However someone who uses these programs really needs an in house backup system.
In use the most common request from my users was for a file from 2-3 days ago. Usually what had happened was that they logged out without saving, and large files would not be saved completely. Most of the time this happened with powerpoint files.
Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.
Take a P2P system which splits up files and distributes the contents across the net already. Put an encryption front-end onto it so that any files written into it are encrypted. Then put a front-end onto all of that which allows you to "mount" the whole thing as a virtual drive. Anyone who wants to have "X"Gb of data storage on the system needs only buy an "X"Gb drive and make it available to the system. In exchange for making "X" Gb available, you would have your files written (in encrypted form) to "the net" and automatically spread out across the Internet. If you want those files mirrored, then you can make "2X"Gb available in exchange for "1X"Gb of space, mirrored. Just doesn't seem as though it would be that hard to do given all of the P2P software out there which already does a lot of this. I just, unfortunately, do not have the time to do it. Sigh.
Saw three suggestions about Tahoe so far, low scores though. So i thaught I'd just add my own low score suggestion about Tahoe (http://www.allmydata.com/).
CrashPlan -
It's entirely cross platform - Windows, Mac, Linux and Solaris.
It uses far less bandwidth than rsync, doesn't slow your systems down, backs up differentially in real time, and lets you go off-site to multiple destinations.
It has faster backup & restore than Mozy or Carbonite.
If you use their online service - you can put as many computer as you want under a single service for $0.10/GB/month which is cheaper than S3.
Oh, and it has unlimited versioning and it's 64 bit clean meaning no filesize limits.
I've heard good things about Spider Oak (first 2GB free) so you might check them out. (disclosure, some of the people who work there are friends of mine)
Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?