BitTorrent For Enterprise File Distribution?
HotTuna writes "I'm responsible for a closed, private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet. We have about 4GB of disaster recovery files that need to be replicated at each site, and updated monthly. The challenge is that all the enterprise file replication tools out there seem to be client/server and not peer-to-peer. This crushes our bandwidth at the corporate office and leaves hundreds of 7Mb DSL connections (at the stores) virtually idle. I am dreaming of a tool which can 'seed' different parts of a file to different peers, and then have those peers exchange those parts, rapidly replicating the file across the entire network. Sounds like BitTorrent you say? Sure, except I would need to 'push' the files out, and not rely on users to click a torrent file at each site. I could imagine a homebrew tracker, with uTorrent and an RSS feed at each site, but that sounds a little too patchwork to fly by the CIO. What do you think? Is BitTorrent an appropriate protocol for file distribution in the business sector? If not, why not? If so, how would you implement it?"
The bandwidth of a DVD in the postal service isn't great but it's reasonable and quite cost effective.
Fucking BitTorrent, it is reducing me my sells.
A couple weeks ago, while browsing around the library downtown, I had to take a piss. As I entered the john, Barack Obama -- the messiah himself -- came out of one of the booths. I stood at the urinal looking at him out of the corner of my eye as he washed his hands. He didn't once look at me. He was busy and in any case I was sure the secret service wouldn't even let me shake his hand.
As soon as he left I darted into the booth he'd vacated, hoping there might be a lingering smell of shit and even a seat still warm from his sturdy ass. I found not only the smell but the shit itself. He'd forgotten to flush. And what a treasure he had left behind. Three or four beautiful specimens floated in the bowl. It apparently had been a fairly dry, constipated shit, for all were fat, stiff, and ruggedly textured. The real prize was a great feast of turd -- a nine inch gastrointestinal triumph as thick as his cock -- or at least as I imagined it!
I knelt before the bowl, inhaling the rich brown fragrance and wondered if I should obey the impulse building up inside me. I'd always been a liberal democrat and had been on the Obama train since last year. Of course I'd had fantasies of meeting him, sucking his cock and balls, not to mention sucking his asshole clean, but I never imagined I would have the chance. Now, here I was, confronted with the most beautiful five-pound turd I'd ever feasted my eyes on, a sausage fit to star in any fantasy and one I knew to have been hatched from the asshole of Barack Obama, the chosen one.
Why not? I plucked it from the bowl, holding it with both hands to keep it from breaking. I lifted it to my nose. It smelled like rich, ripe limburger (horrid, but thrilling), yet had the consistency of cheddar. What is cheese anyway but milk turning to shit without the benefit of a digestive tract?
I gave it a lick and found that it tasted better then it smelled.
I hesitated no longer. I shoved the fucking thing as far into my mouth as I could get it and sucked on it like a big half nigger cock, beating my meat like a madman. I wanted to completely engulf it and bit off a large chunk, flooding my mouth with the intense, bittersweet flavor. To my delight I found that while the water in the bowl had chilled the outside of the turd, it was still warm inside. As I chewed I discovered that it was filled with hard little bits of something I soon identified as peanuts. He hadn't chewed them carefully and they'd passed through his body virtually unchanged. I ate it greedily, sending lump after peanutty lump sliding scratchily down my throat. My only regret was that Barack Obama wasn't there to see my loyalty and wash it down with his piss.
I soon reached a terrific climax. I caught my cum in the cupped palm of my hand and drank it down. Believe me, there is no more delightful combination of flavors than the hot sweetness of cum with the rich bitterness of shit. It's even better than listening to an Obama speech!
Afterwards I was sorry that I hadn't made it last longer. But then I realized that I still had a lot of fun in store for me. There was still a clutch of virile turds left in the bowl. I tenderly fished them out, rolled them into my handkerchief, and stashed them in my briefcase. In the week to come I found all kinds of ways to eat the shit without bolting it right down. Once eaten it's gone forever unless you want to filch it third hand out of your own asshole. Not an unreasonable recourse in moments of desperation or simple boredom.
I stored the turds in the refrigerator when I was not using them but within a week they were all gone. The last one I held in my mouth without chewing, letting it slowly dissolve. I had liquid shit trickling down my throat for nearly four hours. I must have had six orgasms in the process.
I often think of Barack Obama dropping solid gold out of his sweet, pink asshole every day, never knowing what joy it could, and at least once did, bring to a grateful democrat.
No need to get fancy with an "RSS feed". rTorrent, at least, can be configured to monitor a directory for .torrent files and automatically start downloading when one appears. You could set this up, then simply push out your .torrent file to each site with something like scp or rsync.
Ask a warez site.
Wouldn't a dedicated server provide what you need? Upload your recovery files once and than have the server transfer them to each client at high speed. Simple and cost effective.
When I read about the evils of drinking, I gave up... reading.-Henny Youngman
these are technologies that have been proven effective when working together by people everywhere. if you put it together, test it and build a system for fail-safes etc., you should be fine!
Keep the faith, share the code
Next time you should ask at the official BitTorrent IRC channel.
The Python BitTorrent client, which runs on Unix, has a version called "launchmany" which is easily controlled via script. It should fit your needs very nicely.
4GB of files once per month, why bother using the network?
BitTorrent is an excellent intranet content-distribution tool; we used it for years to push software and content releases to 600+ Solaris servers inside Microsoft (WebTV).
-j
Sure! BitTorrent, remember, is only a protocol, it's just become demonized due to the types of files being shared using it. But if you're sharing perfectly legitimate data, then what's wrong with using a protocol that's already been extensively tested and developed?
Just because it's been used to pirate everything under the sun doesn't make it inappropriate in other arenas.
How much do these disaster recovery files change every month? If they stay mostly the same, using rsync (or some other binary-diff capable tool) may let you keep your simple client/server model while bringing bandwidth under control.
I've seen bittorrent used for several business critical functions. One example is world of warcraft distributing updates using it.
Must be good enough for the rest of us.
It is like Rsync on steroids. Cisco's Wan optimization and Application Acceleration product allows you to "seed" your remote locations with files. It also utilizes some advanced technology called Dynamic Redundancy Elimination that replaces large data segments that would be sent over your WAN with small signatures.
What this means in a functional sense is that you would push that 4 Gig file over the WAN one time. Any subsequent pushes you would only sync the bit level changes. Effectively transferring only the 10 megabytes that actually changed.
While it is nice to get the propeller spinning, there is no sense reinventing the wheel.
Cisco WAAS - http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html
Colin McNamara - CCIE #18233 "The difficult we do immediately, the impossible just takes a little longer"
Azureus, for instance, will happily check a directory regularly for torrents and just start downloading those. It should be trivial to apply some sort of external mechanism to PUTting such torrents in place on needed computers.
DHT or the like might seed your files outside the company. Ok, I'm too lazy to work out if that really is a threat, but I'm not sure that bitorrent is appropriate for data that you don't want to end up in the public domain.
You could probably rig up a system where scripts check secure FTP servers for updates, and download them. Cascade the SFTP servers so that each one feeds out to two more, geographically close ones and you'll be ok. If possible only download diffs, not the whole thing. And find an SFTP client which will pull several files at a time since that gives better throughput on high latency connections which are window size limited.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Personally I like the portable media shipment suggestions. But if your CIO/company requires enterprise software from a large vendor with good support, have a look at IBM's Tivoli Provisioning Manager for Software:
http://www-01.ibm.com/software/tivoli/products/prov-mgrproductline/
Besides the usual software distribution, this package has a peer-to-peer function. It also senses bandwidth. If there's other traffic it slows down temporarily so it won't saturate the link. Once the other traffic is done (like during your off-hours or maintenance windows) it'll go as fast as it can to finish distributing files.
Quantum mechanics: the dreams that stuff is made of.
Get it pre-built and externally supported. It'll be a lot easier to fly by your CIO.
The solution you suggested makes sense.
1. RSA keys are shared across the network.
2. A new file becomes available on your "central" server and is placed into a directory automatically shared by a bt client on the central server.
3. A simple script on the central server checks a list of servers it needs to update, and tells each of them to initiate a transfer using the bittorrent protocol.
4. ???
5. Profit.
Haven't you been reading the warnings around here about how bad it is for the Internet? If big business starts using BT we'll microwave the baby!
We do something similiar using WAFS by GlobalScape (Previously Availl).
http://www.globalscape.com/wafs/
It provides bit-level updates to data either on a schedule or continuously, and can keep a specified file version archive too. The continuous update to HQ should keep DSL utilisation low.
Have you thought about building up a distribution tree for your sites?
Group all of your stores based upon geographic location. State, region, country, etc. Pick one or two stores in each group and they are the only ones that interact with the parent group.
E.g. Corporate will distribute the files to two locations in each country. Then two stores from each region will see that the country store has the files and download them. Repeat down the chain until all stores have the files.
why not spread out the backups? Limit the bandwidth of the backups to allow enough regular traffic and have different stores send their backups on different days
I think you should check out http://www.netwinsite.com for a product called Surgemail. It has a built in program called Surgeplus, a client that synchronizes any folders you want. You could create a script on each machine to do daily archival changes and have Surgeplus automatically upload these files to Surgemail.
I know this is a mail server however for 5 accounts this is free to use. You can log into the same account and have each offsite to have a backup of all the files in a distributed manner.
so the storage folder you want to backup to would be like this:
c:\Offsite\Store001 -- current store backup is stored in
c:\Offsite\StoreXXX -- Other stores will sync into these folders automatically (will automatically create it from the server)
On the server you would have the folder you want to store it in as: ...
private/Backup/Store001
private/Backup/Store002
private/Backup/Store003
BEST OF ALL, THIS IS FREE for under 5 users. Since you only need to login as the same user, it will be synchronized across all of your remote stores giving you full offsite secure backup. No need to pay for offsite backup services.
This is also a full fledged mail/calendar server so if you want to use that portion too, it is the least expensive mail server to use internally.
Check it out and give it a try. My clients love it!
*Headline News* censorship shuts down the Internet! More at 6PM!
If you're using Windows XP or above, take a look at the built in tool "BitsAdmin."
Is BitTorrent an appropriate protocol for file distribution in the business sector?
WTF does the business sector have ANYTHING to do with what protocol you use?? The business wont give a crap if you used smoke signals, as long as they get their shit when they need it.
I could imagine a homebrew tracker, with uTorrent and an RSS feed at each site, but that sounds a little too patchwork to fly by the CIO
Go to the kitchen, get the dullest butter knife you can find, and then try your hardest to slit your left wrist. Grind as hard as you can. If you dont see bone, keep trying. Try harder. Did ya really need to submit an article to figure this out??
When I was 17, my parents had an 15 year old female exchange student from Spain. My parents wanted a girl that could hang out with my little sister. My sister and Veronica (the exchange student) didn't get along very well, but they still did a few things together. I was in heaven to have such a hot girl always hanging around our house. And, the best part was that she always wore such skimpy clothes and even occasionally changed clothes without shutting the bedroom door. I caught a few glimpses of her in just her panties and bra. She had such a perfect body and dark smooth skin.
One day, my mom informed me that I would be taking my sister and Veronica to the mall. I hate the mall, but I agreed--mostly just to get the chance to walk around behind Veronica and stare at her perfect ass as she walked around the mall in the tight mini skirt she was wearing that day. When we got to the mall, my sister ran into a group of friends that she knew from school, and she took off leaving Veronica alone with me. I felt a bit uncomfortable, but Veronica said in her broken English that she needed to buy clothes. So, we went into JC Penney. I tagged along with her as she picked out some clothes and a swimsuit. Then she headed over to the dressing rooms. I sat down outside to wait for her. After a few seconds she came out in one of the outfits and asked how I liked it. I said she looked very beautiful, and she kinda blushed at that. Then she told me to come into the dressing room for a second. I asked her why, and she said she wanted to know if I liked the swimsuit, but she didn't want to have to walk out into the main part of the store to show me. So I stepped into the dressing room and she shut the door behind us. I thought she would ask me to turn around, but she didn't! She just started undressing right in front of me! I was getting so horny. I stared at her dumbfounded as she slipped off her blouse, skirt, then her bra and panties. She asked me if I liked her body and I think I managed to mutter yes. She bent over to pick up the swimsuit and I had a perfect view of her soft pussy mound. I noticed that it was glistening a bit with drops of fluid. I wondered if she was horny for me. I brushed my hand against her ass as she was standing up and she turned and smiled at me. Then I knew it was my opportunity. I grabbed her arm gently and turned her around and pulled her body towards me. We started kissing passionately and I touched every part of her naked body I could reach. She slipped my shirt off over my head and I felt her wonderful breasts press against my chest. I turned her around so that I could massage her breasts and finger her pussy while I kissed her neck from behind. She seemed to really enjoy that. Before long my pants were off and I let my hard cock slide between her butt cheeks. She bent over slightly and directed my cock towards the wet mound between her legs. I felt the head of my dick penetrate about an inch into her and I almost came right away. But I held back and slowly thrusted until my whole cock was buried in her damn tight pussy.
She kept saying, "Yes...mas...yes...mas!" And I knew I was about to climax. So I reached around and grabbed the front of her thighs and humped her as hard as I could. I nearly lifted her off the ground as I thrusted into her. The feeling of her ass ramming against my inner thighs was the best! And, I came deep into her pussy.
We kissed a lot more and finally cleaned up to leave the dressing room. I found out that she was a virgin too before that day. But, she had fucked herself with cucumbers back in Spain so she would experience no pain on her first time. That summer turned out to be the best summer ever. We taught each other everything about oral sex, anal sex, toys, and mutual masturbation. WOW!
with IPsec over DSL, and no access to the public internet.
Unless you have very long wires, some box is going to route them. Are those your own?
Otherwise, your ISP's router, diligent in separating traffic though it may be, can get hacked.
Why am I saying this? Not to make you don your tinfoil hat, certainly, but just to point out that if the scenario is as I describe, you're not 100% GUARANTEED to be invulnerable. Maybe a few tinfoil strips in your hair would look nice... ;)
About the actual question: bit torrent would probably be fine, but if most of the data is unchanged between updates, you may want to compute the diff and then BT-share that. How do you store the data? If it's just a big tar(.gz|.bz2) archive, bsdiff might be your friend.
If you push from a single seeder to many clients, maybe multicast would be a good solution. But that's in the early design phase I think, which is not what you need :)
Best of luck!
...is quite straight forward in fact.
This has many advantages:
The beauty of this system is that it relies heavily on existing technology (BitTorrent, RSS, GnuPG, etc), so you can just throw together a bunch of libraries in your favourite programming language (I would use Python for myself), and you are done. Saves you time, money and a lot of work!
Furthermore you do not need to have a VPN set up to every destination as your files are already encrypted and properly signed.
Another advantage is: As this is a custom-built system for your use-case it should be easy to integrate it into your already existing one.
Meme of the day: I browse "Disable Sigs: Checked". So should you.
Your best bet is multicast, there are programs for software distribution that use multicast.
and you can find documentation for it here:
http://www.cs.cmu.edu/~dga/papers/dsync-usenix2008-abstract.html
It is rsync on steroids that uses a BitTorrent-like P2P protocol that is even more efficient because it exploits file similarity.
You may have to contact the author of the paper to get the latest version of dsync, but I am sure they would be more than happy to help you with that.
I'd get a station wagon and fill it with tapes. Go on, mod me "-1 old fashioned"
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
You should take a look at cleversafe.org - it's an opensource 'dispersed storage' infrastructure which allows you to slice up files and distribute them across a network of storage servers. Not sure if this would get you what you want, but it's worth looking into.
I like the bittorrent idea more... but if you're looking for something simple and free - Foldershare. Not sure if this works for you, but I use Foldershare to sync files between several of my offices. It is peer to peer, with a central server to initiate the connection. If you have a 4GB file, perhaps you could rar it into smaller pieces, then this could work for you. If you don't have an internet connection though, this totally won't work for you. Heh.
You don't say if the files are changed at the remote sites, or just at head office.
Rsync is an option - have 10 remote sites replicate from the master, then have other stores replicate from the submasters.
You don't say if you're running windows, but the distributed file system works pretty well. Supports remote differential compression.
sounds like a problem that multicast-based file transfer is designed to solve. http://www.tcnj.edu/~bush/uftp.html You said IPSec VPNs, but is it just ipsec, or is it gre inside ipsec? If there's no GRE, then forget what I said.
CIOs are notoriously conservative. Any solution you suggest that involves building a solution from scratch will scare them. The solution is to use existing proven technology. In the MS Windows world, at least, root kits have been distributing updates successfully for years. You should be looking at simply modifying an existing root kit to your requirements.
Are you using IPSec in Tunnel mode or Transport mode? If you're using it in tunnel mode, then you're not going to fix your bandwidth problem, because all data has to go through corporate HQ anyway because that's where the tunnels end.
Um, ok the data goes down and you have it everywhere but the main site that needs to download it again at a capped rate. How do you get it back to the hosting data site rapidly enough to be useful?
A encrypted usb memory key and a stamp go a long way.
Seriously it's a good idea and nice in practice but have you ever tried sitting there on your hands while a boss with a whip watches you download the company files at 150k/s. If this is to be able to backup your branch office sites and restore remotely that is fine, I just wouldn't want a 3hr downtime to show up on my record while you retransmit data.
I actually have backups go to other sites across the nation now because of hurricanes damage or if the world ends and the future civilization's life hangs in the balance of our spreadsheet data.
It is more my last line of defense.
Most VPN setups like this are hub and spoke with the central office being the spoke. So connections that go from one remote sit to another still have to go through the central office. So you still have a bandwidth problem at the central office. If you have your VPN setup as a mesh so it has connections to multiple sites you might be able to get this to work. The problem you run into then is most inexpensive VPN solutions will only be able to handle so many different VPN tunnels before they run out of CPU. Not know what you used to setup your remote offices as a VPN concentrator this may not be a problem.
Why not use the Hadoop distributed file system? It offers automatic replication and you can treat each "store" as a "rack" to guarantee multiple remote backups.
You also get the immediate advantage of having a single file namespace and instant streaming access to all of the files from any single location.
The only advantage to Bittorrent that I can see is faster recovery time since a single store can source the backup from from N other stores (instead of 2, or whatever number of replications you have decided on).
Sub.tv use bittorrent to distribute large video files to plasma screens in student unions - they auto-download - IIRC, it's an older Azureus client, presumably written with a plug-in, that ran on an always-on windows box.
It seems an entirely appropriate mechanism for it, and they're already doing what you seem to want!
I've set up something similar to this. You almost certainly don't need to transfer ALL of the 4gb every month - you just need to update a copy in the corporate office with all of the changes from the locations. Rsync is the answer. It figures out what's changed and only transfers the changed stuff, which is typically a trivial amount. Rsync is a brilliant piece of work. it's made for exactly the sort of thing you're trying to do. It will work so well you'll think there's some kind of quantum voodoo going on. Also, check out rdiff-backup. There's a version for windows and you can rsync easily between windows and *nix. If security is an issue (and it sounds as if it isn't) you can rsync over ssh, too.
I have used commercial packages like the Enterprise Backup Solution we already use to backup data to tape to mirror files. Even across a SLOW AS CHRISTMAS T1 connection it works VERY well to only copy the files that change on a daily basis. So, unless you are modifying GIGS of data at-a-time, keep it simple.
Lotus Domino! It's replication keeps databases/websites and documents/files contained within them in sync across multiple servers. You can specify how the data is distributed across the network with connection documents.
Setup a cheap file server in a datacenter, hook it up into your VPN network and store all backups there. Use rsync - very fast, uses SSH nowadays for auth and encryption. Encrypt the whole backup partition (dmcrypt, truecrypt, etc.) and keep the key private. Manual mount and key entry after rebooting. That way datacenter operators can't (easily) gain access to the files. Or transfer already encrypted files, which will destroy rsync performance, though.
Set SSH and all the other services to listen on the VPN IP only, making the machine invisible to the common internet.
Not as fancy as Peer-To-Peer distribution, but very reliable and fast. Also you get less administrative headaches, I think.
In windows 2003 R2/Windows Server 2008 they really improved DFS. It lets you set up throttling in 15 minute increments, and with Full Mesh replication, it decentralizes your replication..kind of like bit torrent. However, you have to make sure you don't accidentally use FRS, because it sucks. Where I work we have 5 branches that pull data from our data center. I have DFS replication setup so I can have all our software distribution at the local site. I need to keep the install points at all the sites the same, so I use DFS to replicate all the data, then to get to it I type \\mydomain.com\DFSSharename Active Directory determines what site I am in, then points me to the local share. If the local share is not available, it points me to the remote share, or to a secondary share in the same site...so it gives you failover for your file servers. If you don't have any windows boxes, this wont work, and this really locks you into Microsoft, but it won't cost you anything more than what you have already paid. Below is a link to Microsoft's page with more information, including how to set it up: http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx
Curious about Storage and Virtualization? Check out
Use an existing service to provide it: http://bitsrepublic.com/
You could set up a NFS distributed file system. That may be more amenable to your boss and will have other advantages too.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
www.cleversafe.com
Take a look at your company's network topology. If it is a typical branch setup, like "hub and spoke" where your branches are all remotely connected through the central head office, then BitTorrent will waste bandwidth. Why have a peer to peer application like Bittorrent routing traffic from a branch, up to the head office, and back down to another branch? You do not want to impact other applications running across the WAN on a remote branch.
Unless your WAN topology is fully meshed, peer-to-peer apps are probably not so efficient. It's better to use a direct-push strategy. Take a look at Microsoft DFS (distributed file system) - you can control replication links and times, or use a protocol like FTP and put QoS network restrictions on it. Schedule pushes for off-peak hours, where possible. Stagger updates to each branch, if necessary. My company uses an IBM product called Tivoli to push updates to branches, because it has bandwidth control capabilities. There are other apps like this out there (probably cheaper as well).
BitTorrent is better suited to Internet downloads, and because bandwidth is controlled autonomously in each client, what's to prevent client's in different sites from hogging all the bandwidth in any given branch?
Both Kontiki and Ignite sell enterprise-type (supported, maintained etc.) P2P systems that can be deployed internally if you need something off-the-shelf.
I would need to 'push' the files out, and not rely on users to click a torrent file at each site.
Every heard of remote login, especially ssh <host> <command>?
Sure, go ahead, mod me -1 Obvious.
This signature intentionally left unblank.
...or P2P when you first mention it to the CIO.
I would venture most CIOs' exposure to such things has been limited to what the popular media is pushing: BitTorrent == PIRACY.
I'd recommend sticking to vague terms like "Distributed file transfer".
What platform is used?
Is it scriptable readily?
How scheduled are the updates?
How similar is the data day to day?
Things come to mind as a tradtionally Unix admin:
-cron job to download the file using screen and btdownloadcurses
-ssh login to each site and do the same (if need to push at arbitrary times)
-rsync (if the day-to-day diff is small, might as well do this)
Analogous procedures can probably be down for whatever platform you choose. Learning how to generically apply this strategy in the platform of choice is vital for any administrator of a distributed system.
XML is like violence. If it doesn't solve the problem, use more.
How are they connected to each other? If the same bottleneck router is used to reach each other, then it is a mott point. People often forget about the underlying network workings and abstract away that important detail. They can reach each others IPs, but that is not to say all traffic goes through the same weak link in the chain regardless.
XML is like violence. If it doesn't solve the problem, use more.
I;m guilty of abstracting away that detail in contemplating his article.
If it proves his network architecture has the same bottleneck either way, all the more reason he needs to take a hard look at is data and how amenable it is to rsync.
XML is like violence. If it doesn't solve the problem, use more.
I work for a large company (>50,000 employees). IT recently rolled out a new "video delivery service." The system delivers videos to everyone's desktop. The system is designed by Kontiki. It's basically an enterprise BitTorrent tool which Kontiki prefers to call, "peer-assisted."
A company well-funded enough to have "C-level" execs, shouldn't have ghetto bandwidth.
http://udpcast.linux.lu/
It's spelled 'NNTP'. Look at how Usenet newsgroups, especially for binaries, have worked for decades for a robust distribution model. The commands to assemble the messages can be scripted as well.
Similarly, the bottorrent files you describe can also be pushed or pulled from a centralized target list and activated via SSH as needed.
We have been working on our own proprietary protocol that resembles BitTorrent but offers a bunch of features BitTorrent doesn't.
It's not BitTorrent, just like it. We need it for transferring up to 50 GB+ of data around the world every week.
I agree that for your purposes, a simpler solution is probably in order though. RSync can be very powerful with a scripting layer on top of it. Others have also mentioned iTorrent which is an option.
consider scripting the process of creating a torrent file of the data that needs replication. At each remote site, run some linux or bsd system and setup ssh keys so the central server can run a script on each remote machine.
setup a local bittorrent tracker.
On the main server, script building the torrent file and run an upload script against a list of remote sites that would download the torrent file via scp and run it until it has seeded out a given amount OR has run for x days.
The only issue here that I see is that you said that you are using ipsec over DSL which implies that all of your bandwidth goes through the central site anyway. You would need to build ipsec tunnels between sites and make sure that you have routes in place to use the secondary tunnels for appropriate IP addresses.
Why not send it simultaneously to all locations using multicast?
What about uploading an encrypted version to S3 which can then be downloaded via torrent or the S3 API?
We needed a better solution for pushing out server images to all of out data centers automatically. These images were 5 to 50 gb in size. We have 80gb pipes connecting data centers but we needed to get the images to all imaging servers as fast as possible. We setup a central server that acted as our index of torrents as well as our tracker. The daemons on the remote servers were configured to monitor the torrent index and when a new one was added it downloaded it. Once the server downloads the file it seeded indefinitely so when a new imaging server came online it would seed for it. We are a .net shop so we used mono torrent for both the tracker and integrated it into our imaging daemon. The index was just a web server directory with directory browsing turned on. Works like a charm.
What about something such as dropbox... http://www.getdropbox.com/
Well, why provision the data center with more expensive bandwidth, if a p2p solution can solve the problem without spending much/any extra money? Don't ever buy more of a resource until you are efficiently using the resource. Only if you are using it efficiently (or at least, as efficiently as you really can), and it's *still* not enough, should you actually buy more.
Businesses are pretty adamanant about expense justification (and they should be). You have to justify any expenses, and even when they are justified, if the company doesn't have the money, they won't spend it (usually).
Browsers should support bittorrent-URLs right out of the box, there's really no excuse for not doing this. It would make hosting (large-ish) static content so much easier.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
I've thought about working with either RSS or pushing .torrent files and then having torrent daemon's synce files for me. But so far that's been overly complicated as I have ~1500 production web servers to keep up to date on different code releases depending on what cluster they're in.
/etc/rsync.conf up to date with the potential subnets that we have internally so random outside machines on the WAN can't rsync to our boxes. But we also keep a "lock" table in our database along with the state of each server.
:)
Currently we have a database of our production equipment and we keep track of what cluster/role a server is in. So the boxes all run a cron every minute that checks to see if a new "version" of a release is available (any code change whether a new code release or just an update increments version for that cluster).
Now we're use rsync for our transfer. So we keep our
So we have one super seed if you will which keeps a copy of code for all clusters. When you update a release (kept elsewhere) and then "push" the change you made gets copied to the super seed machine. Once that is done the version gets incremented and then the crons start to look for the new code.
Basic idea to locking we we limit the number of peers a box is allowed to have. So in order to not impact production traffic it's set to 3 currently (plenty fast). So if no "peers" (any machine in the cluster that is not the seed) has the code it will be used first and the boxes that fired off the cron last will sit in a queue (waiting to get a lock on anything with code that hasn't reached it's limit of locks). So when it first starts out the first three boxes will fail over to the super seed. Then the peers will start to get it from those boxes once they are updated.
Doing a code push to our larger cluster used to take 45 minutes to an hour (was done in sequence and not in parallel) but now takes about 5 minutes.
My next goals are to distribute the super seeds and potentially use RPM distribution since I'm working on making the code release only restart the fewest service necessary in order to pick up the changes. And with RPM I can have those commands in the install portion.
Midget Tosser
agreed. Have you been watching the economy? well-funded shouldnt imply retarded.
If you maintain a culture of appropriate thriftiness at every level of your organizations, you will likely never get to the point of having 1 executive riding a private jet that can move 20 people for a dinner meeting.
That being said, bandwidth can be pretty cheap and at most places around the country you can get 20Mb of fiber for $500-$700/m.
Remeber the key word in the phrase, "appropriate" thriftiness.
A group in the Netherlands has already commercialised BitTorrent to manage enterprise patch deployment. The product used to be called BitRain but was renamed to DistriBrute. You can talk directly to one of the developers Leo Blom: lblom AT iteleo.nl He was really helpful last time I talked to him about it. Also, as one of the other posts pointed out, if you are going to do this within your VPN cloud, you need ot make sure that the VPN tunnels are multi-point (each site can talk directly to the other) or you will not solve your problem (cause all traffic will go via the main hub). Please MOD this up as I am pretty sure this is exactly what he is after. http://www.4m88.nl/ Leo Blom
Miro (formerly known as Democracy player) is multi platform, has a rss feed reader and a bittorrent client built in.
It's multiplatform and being open source I bet it can be run as a daemon.
Doesn't a BitTorrent folder already allow adding additional stuff later?
I would recommend making a small modification of an existing open source torrent client:
Let the download never stop. Make it look for now parts, updates to downloaded parts (via sha1), and new files in the directory structure of the torrent until the end of time.
That way you have an instant error-resistant peer-to-peer backup and replication service that is as easy to use, as copying (or linking) the files into the right folder.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Just write up a couple perl scripts. one to send data, and one to sit on the client machines constantly monitoring a port. fast and easy.
After you have set up the infrastructure as in rules and a torrent server what you could do is set up rtorrent at each site to watch a directory for torrents then simply scp the latest torrent to all sites. Rtorrent will grab this and start downloading it. This leaves the issue though of potentially purging old files but thats for another topic.
Sorry if somones already posted this solution i dont have time to read all of the replys.
subspace for its communication needs.
I'm confused.
Slashdot "libertarians": Small government for me, big government for those I disagree with. -1, I disagree with you
Comment removed based on user account deletion
If you own your network enough, you could consider multicasting. udpcast is one tool for doing this, or you could look into implementing a file carousel. If you don't have the network support for multicast, then this won't be very helpful.
Well, there are a few torrent clients that can be run from the command line, just install one on each machine with a shell script/batch file to do the work.
You could also consider writing something simple using split and wget or whatever.
the .torrent file includes the hash for each part of the file, so your client won't complete the download if the hashs don't match.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
Our company has a constantly changing source. Sometimes files are just moved about a little.
I started looking into bittorrent for keeping our vendor in sync but the IT guys ended up home brewing it for a variety of reasons.
Our main office has a slow internet connection, and we were driving a hard disk up to our datacenter for it's high speed internet, but we needed some files uploaded as soon as possible, and we didn't want to duplicate transfers.
So The idea was to have multiple seeds for our vendor, and then use the seeds for our off-site backup.
When the file system changing every few minutes starting and stopping doesn't work too well..
We would have had to hack it a ton, and we didn't need all of the features, (we wanted all the features, but basic needs came first)
Debian's "apt" package managing utility has some new torrent support, as well it's long established version, caching, etc. capabilities. You could possibly (depending on the format of your data) distribute it in .dpkg packages via apt.
Hey I resent that. None of my DVDs have ever taken a bribe.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
uTorrent supports this. It is called Initial Seeding. And it does exactly what your script intended.
*drum roll* .... FTP! With a bit of scripting and command line Winzip you can transfer everything automatically. Get a $5 a month unlimited bandwidth hosting account and have your clients check for updates every day. The new stuff is downloaded automatically and you can have the script even email you when the transfer is complete (with details of the CRC and file size).
snakebite from actlab.tv should be a good solution - if it works, i haven't got it to work yet. it watches a folder for normal files, creates .torrent files and lists them on a tracker or html page ... not sure about automated downloading of them, i was going to send out an RSS feed to the clients (each only needed a subset of the total file pool anyway)
bitsadmin, to create jobs for the Background Intelligent Transfer Service (BITS) in Windows.
Maybe I haven't done enough work in the "enterprise", but wouldn't a script in a cron job be more appropriate here? Program it to check for a new .torrent file every day (or an appropriate frequency), and when new, start bittorent.
Off the top of my lame head, I can think of at least two easy ways to check for new torrents. The easiest would be to just download the previous torrent file, and cmp it with the old one.
The second would be to write a python script which keeps track of the etag or modification time of your .torrent update. It is easy, just read Chapter 11 of Dive into Python. Section 11.6 appears to have what you want.
At first glance this would be easily implemented via BITS and local squid proxy server.
What you really want is a solution like Netapp file servers. It will distribute the files at the file system block level, updating only the blocks that have changed. You install a filer at your central office, then have multiple mirrors of that one at the various field offices. All the PCs get their boot image off the network file server (the local one). With one update, you can upgrade every PC in the entire company.
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
DFS is your friend.
This really depends on:
* If the sites all use the same 4G DR file
* How many sites you have
For instance, if you had 10 sites that all used the same file you would seed from the source site and other sites would join the p2p download as they join the share. Let's say that you base it on time zones. And you have 2 sites start the download at time A, another 3 join at time B, and the final 4 (one is "hosting" the seeded file) join at time C. A will download directly from your hosting server, B will be able to download from the hosting server and the A servers, C can join the fun and download from any of the sites.
From a bandwith perspective, the hosting site will have the entire contents downloaded from it (uplink), the A sites will most likely get the next largest traffic, followed by the B sites and other C sites. (I realize that you cannot guarentee how much is downloaded from where). So, you have to be mindfull of link speeds, upload caps / limits, etc per site.
If you do not need the same 4G file at each site; it may just be easier to use a DVD and ship it off site.
Just my 2 cents.
uTorrent has a network interface built-in that I use to add torrents to my dad's machine from my machine, the file downloads on his server and is then available on our home network. If you have uTorrent at all these disparate sites their interface can be made available to you where ever you are, the interface can also be locked down to only be available to a specific ip address, etc.
I've found that GetDropBox.com is a great tool for replicating files across machines. It has a 2gb "Free" version but I'm wondering if there is a paid service for more space? Only those files that are changed are replicated, very bandwidth friendly, etc.
Namaste
konspire2b - http://konspire.sourceforge.net/
_claimed_ to be faster for distribution than BT at the time. Maybe that is not true anymore, but I always thought konspire2b looked interesting for intranet stuff.
zsync - http://zsync.moria.org.uk/ zsync precomputes the checksums for rsync so it moves the checksum load to the client side. It also claims to have a way to deal with compressed files efficiently. Again, looks interesting for intranet distribution.
Now if we could combine some of the features of konspire2b, zsync and dsyncwe'd have the ultimate file distribution system.
First understand the problem.
Does the "main" site have 4G of data that needs to be protected for recovery in the case of a disaster? If so, just about every post before this one will produce a solution that will not work. For DR, you should be designing a system that recovers fast. Unless you have an incredible pipe, there is no way you are going to pull 4G of data from your remote sites in a business-acceptable amount of time. Chances are you will probably go with a tape system or similar architecture.
If you're just distributing 4G of data from a "master" to many other locations, the rsync or BT ideas will work, but don't mistake this as being a DR solution.
I would be money that your CIO had some kid relative who is just entering the IT world make the suggestion. No-one who does any real DR planning would seriously consider what most of this post & thread are about.
-a *real* professional DR planner
Groove (http://groove.net) does exactly what you want. It's already in use for file distribution in the private and public sectors; coast guards, hurricane warning systems, as well as many large-scale businesses use it successfully.
I got curious about this post, which has already attracted some attention in the blogosphere. ("Solaris at Microsoft?") Some googling reveals that WebTV was originally developed on BSDi, then moved to Solaris. At that point (1997) they were acquired by Microsoft.
I seem to recall read that many of Microsoft's Unix-based acquisitions have had trouble moving to NT, despite the obvious pressure to do so. So there were probably Solaris servers at Microsoft's Mountain View campus (where WebTV is located) for some years. But it's been 11 years, and I'm sure those Solaris servers are long gone. You'll notice that J refers to them in the past tense.
I wrote an application that I implemented on 20,000+ machines that replicated files over 256kb WAN lines using UDP, because UDP is connectionless is at a lower priority than TCP.
It sent 1 kB/sec for a total of about 8GB a day. Of course it was no easy task (2yrs development) to create a protocol that can detect and recover missing parts of a file then reconstruct the file when it is complete. As well as replicating to machines that were off when they were turned back on. It used MS SQL server to maintain the file lists, sends and WAN connections. It ran for a couple of years.
My work canned the whole thing in favor of Microsoft SMS, because it has web reports I guess. Not many company have as many branches as we do (>1,000)
I am implementing something similar using trackerless torrents, DHT and LSD from libtorrent. All you need to do is distribute the torrent hashes to the clients, and libtorrent will take care of the rest.