Synchronize Data Between Linux, OS X, and Windows?
aaaaaaargh! writes "I'm using a laptop with Ubuntu 8.04 for work, a netbook with Ubuntu 9.10 when I'm outside, Mac OS X 10.5 for hobby projects, and Windows XP for gaming. For backups, I'm currently using Jungle Disk and Apple's Time Machine, and I use a local svn repository for my work data. Now I need to frequently exchange and synchronize OpenOffice and Latex files and source code in various cross-platform programming languages between one machine and another. Options range from putting everything online (but Jungle Disk disks seem to be too slow for anything else than backup), storing my data on external media like USB sticks or SD cards, or working with copies by synchronizing folders over the network. I don't want to give my data away to some server outside without strong encryption (controlled by me, including the source code) and external media like USB sticks are a bit too fragile according to my taste. The solution should be reliable, relatively failsafe, as simple as possible, and allow me to continue to use Jungle Disk for backup. So what would you recommend?"
USB sticks are a bit too fragile
damn straight. that is the number one problem with USB anything. i've seen more broken jump drives, and more broken usb ports from someone tripping over usb cable, than i care to fix. yes, they ARE handy as can be, but to WHOMEVER is designing USB 4 or whatever it will be called, PLEASE make the damn connection more sturdy.
End of discussion.
Don't worry about the encryption overhead, just make sure AES is the first cypher chosen. AES is VERY fast.
Test your net with Netalyzr
Are you insane?
The image is random data, and adding or removing a small file in the image would require a complete resync.
Agreed - but if rsync works fits, it is preferable.
That said, the constraints where rsync works flawlessly are pretty strict:
1. You always rsync down at the start of a session
2. You always rsync up at the end of a session
3. You never have more than 1 session
Just make sure you use the best settings - don't forget --delete or whatever it is to handle removed directories.
Unison works much better due to its 2-way change propogation, but it is only designed to handle 2 sources of documents, not 3.
I've never had to handle this sort of thing with unison, so though it may work, I'm not sure of the mechanisms it uses to handle resolution in batch mode. If it's timestamp based, you could hose yourself.
If you make the Mac and XP boxes share your data over nfs, you restrict yourself to the 2 source case, and Unison should work fine.
Of course when someone steals your laptop which is syncing to dropbox, the data is theirs. You can unlink updates to the stolen device but the data is gone. I'd love a remote wipe facility.
Dropbox is good up to a few gigs, but their pay service is expensive per GB. I presently use it with SyncToy (scheduled) to keep stuff up-to-date between machines... but only for a small set of important stuff.
For larger things, the OP's selection of Jungledisk is far more cost effective... and if speed is an issue I wonder if he/she knows that Jungledisk backends to EITHER Rackspace or Amazon. They should try the other.
Syncing data is a hard problem. It doesn't look like it could be a problem at first, but there is only a surprisingly small number of products which come close to solving it. Different meta data across platforms (time resolution and time zone, different types of time records, permissions, naming etc.), extended file structures (streams, resource forks, ...), comparatively slow and intermittent network connectivity, concurrent changes in more than one place, huge binary files with only small changes, renamed or moved files, open file handles, the list goes on and on. Finding a sync tool which satisfies your requirements is just as hard as finding a reliable backup solution: They are similar problems and there are heaps of tools for both, but only very few which are even worth considering. The rest are half-assed marketing-driven data-eating annoyances. (BTW, I use rsync, but it has deficiencies. For example, it does not deal with open files.)
There's also the question of whether you really want to store things online. If I want to sync a few MBs of documents, syncing over the Internet to a server might be fine. On the other hand, if I want to keep 100 GB synced between two computers on the same network, pushing that over a 512k Internet connection just to download it again might be less than ideal.
Simple? Just set up ZPS on an Opensolaris box?
None of those protocols were designed to work well over the public internet. Sure, NFS/SMB/iSCSI are IP protocols but they don't work well over high latency and low bandwidth connections.
I'm not arguing with your choice of ZFS; I would just look at something more simple like rsync over ssh to a ZFS volume instead of sharing NFS/CIFS/SMB/iSCSI out to the Internet.
"When the president does it, that means it's not illegal." - Richard M. Nixon
So why bother? Why not use distributed VCS?
They are quite easy (to learn), they handle merges apparently much better, they give version history, rollback, branches, etc. After all, the OP was using SVN.
BTW, from my perspective your own post is more typical of Slashdot than my own: theorizing without data, postulating ignorance on the part of those with experience, and generally trying to adopt a tone of undue authority.