Use BitTorrent To Verify, Clean Up Files
jweatherley writes "I found a new (for me at least) use for BitTorrent. I had been trying to download beta 4 of the iPhone SDK for the last few days. First I downloaded the 1.5GB file from Apple's site. The download completed, but the disk image would not verify. I tried to install it anyway, but it fell over on the gcc4.2 package. Many things are cheap in India, but bandwidth is not one of them. I can't just download files > 1GB without worrying about reaching my monthly cap, and there are Doctor Who episodes to be watched. Fortunately we have uncapped hours in the night, so I downloaded it again. md5sum confirmed that the disk image differed from the previous one, but it still wouldn't verify, and fell over on gcc4.2 once more. Damn." That's not the end of the story, though — read on for a quick description of how BitTorrent saved the day in jweatherley's case.
jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."
jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."
Awesome idea. I've done this in the past with stuff. If a corrupt version was on one tracker, I'd save the files, get a new torrent and import the old files. Saves a lot of bandwidth wasting.
If I happen to see a stuck torrent (many leechers, no seeds), sometimes I can find a good version of the file I already have - so I start the torrent, stop it, replace the single good file (sometimes you need more if the file is smaller than the part size), and upload a few Kb to finish the torrent. Then sit back and watch as everyone fills up.
Those who have never developed P2P software might never understand why they all need to use strong checksums to detect data corruption, and why bad blocks actually do appear in the wild; frequently.
You'd be shocked - SHOCKED - at how much data gets corrupted routinely - by errant antivirus software, flaky network equipment, plain ol' line noise that the checksums don't detect (which will happen much more often than you expect, see also birthday paradox), or misbehaving routers who think that any occurence of 0xC0A80102 obviously must be an internal IP address and needs to be changed to your external one. Even if that's in the middle of a ZIP file. Oops.
Encryption actually aids this somewhat, as the same byte patterns don't get repeated, so if there's an errant IDS changing things for example, it tends not to fire the second time.
I've done this before for file repairs. Works a treat, but you sort of wish that torrent used a Merkle hash tree such as the modified THEX standard Tiger Tree Hash. SHA-1's so last century.
We have been doing this for ages for certain high-demand games file that we mirror. While offering torrents for some of our download mirrors is only mildly useful (as we're in Australia we're trying to keep bandwidth on-shore to cut down international traffic, and BT doesn't really help this), it is extremely helpful for the VAST amount of users that appear to either have massively crazy Internet problems or are simply unable to drive a HTTP based downloader and resume downloads.
When a large number of users are having problems downloading or resuming a particular file, I simply create a torrent for them and give them some vague instructions about how to resume it and then generally I never hear from them again. They're happy because they don't have to download a 4gb game client again from scratch, they don't have to worry about resuming/corrupt downloads, and because its a torrent it probably feels like they're getting something for free that they shouldn't be.
could also be one's routers.
There was a problem w/ dlink routers back in the day that hit alot of p2p users. If you placed your machine in the dmz, the router basically did a search and replace on all packets replacing the bitstring representing the global address w/ the bitstring representing the local address. On large files, this didn't just hit in the ip header, but in the data as well corrupting it. If you didn't use dmz functionality, just port mapping, it worked fine, so if you were using bittorrent, you'd get repeated hash fails on some parts that would never fix, because bitorrent has no capability to work around that (as opposed to eMule's extensions)
For even more fun, if you have two differently-corrupted copies of a file and a torrent to go with it, then you can have BitTorrent stitch them together into a valid file without involving any third parties.
I used Azureus's internal tracker ability and two computers on a local network with the torrent modified to track on one of the machines, and one corrupted copy of the file on each.
Obviously only works if they don't have corruption in common, but it also doesn't require the original torrent file tracker to work anymore.
IIRC TCP/IP has a guaranteed maximum error rate of at least 10^-5 bits. Well, the thing is, 1.5 Gigabytes is over 10^10 bits in length. So even at such an error rate, it is not guaranteed that your file will arrive without bit errors.