Use BitTorrent To Verify, Clean Up Files
jweatherley writes "I found a new (for me at least) use for BitTorrent. I had been trying to download beta 4 of the iPhone SDK for the last few days. First I downloaded the 1.5GB file from Apple's site. The download completed, but the disk image would not verify. I tried to install it anyway, but it fell over on the gcc4.2 package. Many things are cheap in India, but bandwidth is not one of them. I can't just download files > 1GB without worrying about reaching my monthly cap, and there are Doctor Who episodes to be watched. Fortunately we have uncapped hours in the night, so I downloaded it again. md5sum confirmed that the disk image differed from the previous one, but it still wouldn't verify, and fell over on gcc4.2 once more. Damn." That's not the end of the story, though — read on for a quick description of how BitTorrent saved the day in jweatherley's case.
jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."
jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."
Awesome idea. I've done this in the past with stuff. If a corrupt version was on one tracker, I'd save the files, get a new torrent and import the old files. Saves a lot of bandwidth wasting.
TCP/IP provides data integrity guarantees. So, if your ISP wasn't mucking with your packets (and their checksums), either Apple was sending the wrong bits or your hardware or software was screwing with them. My vote is it's not Apple.
I suggest you diagnose your computer problems, rather than relying on BitTorrent to fix them for you.
If I happen to see a stuck torrent (many leechers, no seeds), sometimes I can find a good version of the file I already have - so I start the torrent, stop it, replace the single good file (sometimes you need more if the file is smaller than the part size), and upload a few Kb to finish the torrent. Then sit back and watch as everyone fills up.
Those of us who use BitTorrent for *ahem* illegal purposes have been doing this since the beginning. The only way to get rare and complete downloads was to take the files to other trackers and match them against another md5 to finish the download.
.r23 file which is just a bit too short for some reason :)
It's like getting parity files over on usenet to fix that damned
Fiesta Online
For heavy BT users this tactic is very common, provided the file(s) you are willing to download is fairly well available from different sources.
Are their even MD5 hashes on Apple's download pages for such large files? Jusging by how the article was written and the lack of hashes on the QuickTime and iTunes download sites, it doesn't seem like they even bother.
One should be more concerned as to why your files are becoming corrupted.
I'd say its a safe bet that the files from apple.com are in perfect condition.
Which means it either became corrupted in transit to, or on arrival to your machine.
Which leads the question, is your memory defective
run memtest86 to check your memory.
http://www.memtest86.com/
Check if your Harddrives have SMART and are reporting anything. A disk checker would also be a good idea.
The other idea that springs to mind is if your behind some proxy with the above problems, although i doubt anyone would want to proxy a 1.5gig file.
Fact is, if files are being corrupted on your disk, its just a matter of time before something more important is hit by corruption.
To avoid criticism; Say nothing, Do nothing, Be nothing.
But Torchwood is usually pretty good, imho.
I've used bittorrent for this purpose many times in years gone by.
:)
Especially with our slow links, or worse yet, on dialup (if I go enough years back) in Australia.
Before bittorrent I would use rsync. That required me to download the large file to a server in the US on a fast connection, then rsync my copy to the server's copy to fix what is corrupt in my copy.
It works beautifully.
You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.
We have been doing this for ages for certain high-demand games file that we mirror. While offering torrents for some of our download mirrors is only mildly useful (as we're in Australia we're trying to keep bandwidth on-shore to cut down international traffic, and BT doesn't really help this), it is extremely helpful for the VAST amount of users that appear to either have massively crazy Internet problems or are simply unable to drive a HTTP based downloader and resume downloads.
When a large number of users are having problems downloading or resuming a particular file, I simply create a torrent for them and give them some vague instructions about how to resume it and then generally I never hear from them again. They're happy because they don't have to download a 4gb game client again from scratch, they don't have to worry about resuming/corrupt downloads, and because its a torrent it probably feels like they're getting something for free that they shouldn't be.
Big deal, I do this all the time. It also helps when you're downloading files via Torrent and supplement with pieces from the newsgroups. This combination works well because newsgroups often have RAR'd binaries that are missing files. Find a similar package available on a Torrent site and fill in the missing files. Hell you can start the Torrent first and do a Force Check as you add each piece. Why not just download the whole thing via Torrent then? Well nntp is local and much faster... Had I known this was worthy of a slashdot submission I would have done it all long time ago.
For even more fun, if you have two differently-corrupted copies of a file and a torrent to go with it, then you can have BitTorrent stitch them together into a valid file without involving any third parties.
I used Azureus's internal tracker ability and two computers on a local network with the torrent modified to track on one of the machines, and one corrupted copy of the file on each.
Obviously only works if they don't have corruption in common, but it also doesn't require the original torrent file tracker to work anymore.
Using bit torrent for it's actual legal intended use. I love it!!!
/.
I'm not a lawyer though. I just hope it doesn't violate apples NDA. Please please please follow the rules. Don't want to see you in prison or slapped with a large fine.
Bit torrent has received a bad reputation because of pirates. There are legitimate uses though. I do believe that doctor who episodes aren't public domain, so shame on you for that. Might want to be careful what you admit to on
Hey guys, check this out! I just found out that you can send emails to multiple people AT THE SAME TIME by putting a comma between their email addresses! Pretty cool, huh?
This guy's the limit!
I can have 60% of a file downloaded but have BitTorrent only see 10%, I'm guessing because it's missing an article somewhere around there. Any client that zeroes missing file parts instead of simply not writing them? Is that possible? What about with par2 files?
I used to buy Debian CD's from a Linux shop in Sydney Australia. A few times I'd get badly burned CDR's from them, so I would take an image of the bad discs with dd, then rsync that image to fix it and then burn the fixed image.
Worked perfectly every time. I'd rather use BitTorrent for that though. Probably be quicker.
Just for the sake of chaos, here are 5:
lcw82wrfd7vxyf6iuzq2i6l3kyrbos1lyykdd1bjxq9v5
6xeeoo52jhmaijrjodvhehaeqn3w70keuwxvajby
ky8cswluxe0jh2km2rw5tbpc37agdnogk32bq5r98
mfqtowgp6l2gial5leeardj1hw91lv9mey2rgc0s
xdnlqkijbc4fu105hil2jql3g8h9ri61uvtw3g
http://www.demonoid.com/register.php?with_invite=1
Thanks!
Thanks a bunch. FYI /. users, i took the first one the other 4 are free!
I'm here for the experience, not the Hyperbole.
I wrote this bash script to do basically the same thing. It uses openssl (built into most unix and OS X in specific) to create 1mb check files basically the same as torrent files. Follow the instructions and its easy to fix a corrupt download from someone that has a good copy, with the minimum required data transfer. The person with the bad file runs option 1 to make the check file and sends that to the person with the good file. They run option 2 which identifies bad chunks and exports them, which they send back to the first person. Run option 3 and the exports are patched into their download and it's fixed.
Last time I used it, we repaired a 3.8gb transfer by exchanging 11mb of data. (the transfer had been resumed multiple times and apparently one of the transfers glitched its offset or something)
This is easier than BT because using BT can have a bit of a learning curve for seeding. Beta but appears stable. Feedback encouraged.
I work for the Department of Redundancy Department.
Bottom 4 are already used. Didn't need to try the first one. Get it while it's hot!
Okay, to celebrate Demonoid's re-opening, here's some more:
f5mptgleeecic81ppcn2hzkjugrg4b4sdglrwe4e
enz2yuz1gsv17mpetil8ltsmq1e17cbtw11fc9uvoa
p6gzu1iguz4o0aep93l5h1jujwt13pg5q9wy5
5ubabvxlkj4z8jmr0iu8kreil7xcf7jkp2ia2252442
yubtly2w8ghvae5839faz5mmancawheh0vgf70merdm
This is not really anything BitTorrent specific, but good use of available tools. However, I hope you then checksum verified the completed file with an MD5 from Apple or somebody who has downloaded directly from them. While you probably weren't a target of an attack, you did download software from an unknown source. An attacker could download the SDK, insert malicious code, compute a new set of MD5 sums for the torrent file, upload to pirate bay or some tracker, and then seed the torrent expecting that nobody will attempt an external verification.
I had a shitty old hard drive that was failing CRC (cyclic redundancy checks) but the file I had downloaded was 4 gigs, and there were a few corrupt pieces, but by copying it to another hard drive, and replacing just the corrupt pieces I saved myself a shit load of bandwidth.
Orbis terrarum est non altus satis
Wouldn't the methodology just save a lot of bandwidth?
If you save the wasting, does that mean that you subsequently have to waste that bandwidth elsewhere, so that the entropy of the universe remains constant?
Would a sufficient quantity of hoarded bandwidth wasting go unstable, become a bureaucratic singularity, and emerge as a new government, complete with its own non-event horizon?
I worry about stuff like that.
Thanks! The top code worked for me, and the bottom one was already used... (:
Whoever stated that signature sizes should be limited to one hundred and twenty characters can just go ahead and kiss my
OK, maybe not tonight-at-eleven news, but this is a totally clever hack, which is exactly what many people on Slashdot live for.
On a related note, I came up with a roundabout way to do something similar to help a friend who was having trouble moving large files. On the remote end, split the file into small chunks. Then md5 them all and save those results into a text file. Then, ftp them, and when they arrive, md5 them all again and compare your values to what's in the text file. If any don't match, re-download them; else cat them all together and you should be good.
I don't think this wouldn't have worked for the submitter, even if he knew someone with a known-good copy of the file, because I imagine these things work linearly, so if the bad part of the file was at the halfway mark, every chunk after that would have the wrong checksum. His method was very, very clever.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Got the second one, thanks.
blog & fiction: jd87
I have been using Torrents for this very reason.
I was being required to copy sometimes 10-20GB of Virtual Machine Image Files from Server to PC or PC to PC on up 40 machines at one time.
This was taking way too long and copies were not perfect.
Restoration of VM images presented the same problem.
Updating a VM meant redistribution of the entire file to all machines again.
Using (Micro) Torrent and my own tracker changed all that.
I came up with the following solution using all available resources.
First I started by copying all images to workstations to a separate partition. (about 200GB of VM's.)
Then I created created my own internal Tracker and Web Page to host torrents.
The results were:
1. Extremely efficient use of all available network hard drive space.
2. Utilities every machine on the network to distribute the files.
3. Works extremely well restoring or redistributing the VM's to any one machine or several machines at once. (The more the better)
4. 100% accuracy in distribution.
5. The ability to quickly modify any one image on any machine, recreate the torrent(hash) and then update that image across hundreds of machines very quickly.
In other words, modifying a file only means that the machines only have to download the bits that changed not the whole image again.
6. With Micro Torrent any machine can be used as the tracker.
7. The Tracker is also the "master" file server, however any machine can be used to modifiy and upload a change
Just recreate and re-upload the new torrent replacing the old one. Remember that a torrent file serving network is Not a server centric file sharing system.
I used to download Linux ISO files directly from FTP or web sites.
Nothing upset me more than downloading an ISO only to find out that after I burned it to CD/DVD, it had CRC errors and random lockups during an install.
After BitTorrent with error correcting, the problem was solved. It works for other things as well.
Commercial software companies can offer ISO downloads via BitTorrent trackers and send the install CD Key via email. That way customers just burn the CD/DVD and install the key they got in email.
Some thing with media files, download via BitTorrent enter an unlock key you get via email when you bought it.
Business are stupid if they ignore the benefits of BitTorrent.
Even piracy doesn't hurt that much as most people want to try the software before they buy it. It is like kicking the tires before buying a car and taking it out for a test drive before signing the papers to buy it.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
No, I've seen this as a requirement for a few private trackers. It put me off on posting as I'm not going to waste my time.
I've used it to finish up the last 3% of a jigdo build when I was missing a file or two. Worked great.
I wonder if you could legitimately argue that you were verifying the data in a personal backup of media that you had?
Unless I am mistaken, it is perfectly legal to make a backup of data that you own right? So, if you already own an item, would downloading it to have a backup be a legal thing to do?
And if that's the case, I wonder what the legal implications are in cases where the RIAA comes down on people who have been "participating in file sharing" activities.
Moved to http://soylentnews.org/. You are invited to join us too!
Assuming you can find a source that serves a known-good file via rsync, it's a very efficient way to fix up a damaged copy.
I once had to download a CD image over a dialup connection when I was at a client site in Mexico. I did the initial download via FTP, but it got corrupted and the MD5 sum didn't match the correct value. It had taken almost two full days to download the first time (over a weekend, so shipping a CD wouldn't have been faster), but rsync was able to find and correct the corrupted sections in less than five minutes.
Rsync is also an unbeatable tool for making incremental backups. I use it (rather, I use rdiff-backup, which uses rsync) to back up a server with almost 30 GiB of data, nightly, over a standard cable modem connection. Last night's, for example, took 57 minutes to run, found 527 changed files totaling 1.36 GiB of 26.2 GiB total. I don't know how much it actually downloaded, but I'm sure it was much less than 1.36 GiB.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Yeah, if I had a 30GB cap, I'd be just as pissed off and irritable as you are. But I don't. (Nelson voice): Ha ha!
Latest ÂTorrent (1.8+) also allows you to point to a file name that is different on your hard drive. You don't need to worry about file names matching up any more if the bits are identical.
I agree TCP/IP has problems with raw file transfers.
However, a Torrent system ensures the delivery of the file based on the files hash value.
This is very beneficial if an update or recovery of the original file needs to be made.
Simply recreate the torrent, upload the updated torrent.
Once the clients get the new torrent they only download the changes to that file.
For instance.
A 10GB virtual image file needs to be changed.
Make the changes needed, recreate the torrent, upload the new torrent.
Clients download the new torrent fot the same file.
Restart the download of that file to the same location.
The client makes a hash check.
This time according to the hash value only 12% of the file has changed.
Only the bits that need to be needed to match the hash are downloaded.
Not only that but because it an asyncronis file transfer across multiple machines, on a large network, the update occurs incredibly fast.
This works 100% of the time.
Since a central file server is not needed, any machine on the network can act as the tracker, hardware failure maybe is the biggest concern.
But then again a failed component is always inevitable.
With a torrent system corrupted data transmission is no longer a real problem.
i've done all the tricks mentioned here as well. for a couple of years now, i've been wondering when someone would release a backup/archive/sync tool that utilized bittorrent tech...i'm still waiting...
The first rule of Usenet: don't talk about Usenet.
( Redundancy is ) ^ n
Most do, but you may have to enable it in the program's options.
What wouldn't Jesus do?!
Seriously, get another ISP! I mean, what kind of a BOFH-run ISP only lets you download 1GB per *month*, except for night hours??? Whoever came up with that shouldn't be allowed more then 14k4 for ever!
Besides rsync & torrents, you can also repair files with metalinks, which require nothing extra on the server, and is not blocked like p2p in some places.
This is why so many distributions use them for ISO downloads, so you don't have to restart large downloads from the beginning.
I've been doing this with linux ISOs for quite some time. Never thought it could be unknown to anyone.
...yesterday I used BitTorrent to repair an Ubuntu Studio iso that I downloaded from my local ftp firehose. The MD5SUMS mismatched, so I fetched the matching torrent file, fired up KTorrent and pointed it at the dir I downloaded the iso into. Only 1 block needed repairing, saving me a helluva long download.
The Hacker's Guide To The Kernel: Don't panic()!
"zsync is a implementation of rsync over HTTP. It allows updating of files from a remote Web server without requiring a full download or a special remote server application. It uses a metafile, which is created on the server, to determine which parts of a file the user already has; it then downloads the remaining parts via HTTP."
First, as rdebath argues, you only get 16 bits of CRC on TCP headers.
And furthermore, if you start calculating CRCs off random data, chances (>50%) are you will get a collision (two chunks of data with the same CRC) around the 256th try (this is known as the "birthday paradox" in criptography). Of course, to be really sure to get a collision you will need to try at most 65536 values; but you will reach a very high probability of clash much sooner than intuition may tell you.
See birthday attack for the math.Use Emule for widely unavailable files.
Please people. It's very easy. Just go into your settings and look for something that says Protocol Encryption and say 'Enabled'. If everyone gets into this habit, we will all live in a far better world. In fact, encrypt any application (that traverses the Net) you can. Application layer is nobody's business but your own.
Rsync also works nicely for "upgrading" CD images of beta Ubuntu releases to the final version, and for, say, making a Kubuntu Live CD out of the normal GNOME-based Ubuntu one. It has the advantage that it can spot blocks that have moved around in the new version but are still the same, even if they're no longer on block boundaries.
I was sick of multipart files in 1991, ha!
All your points are solved by software, split rars are a hack on deficient protocols or routers that limit BW per tcp connection.
Oh and, what is it with these stupid long ass crap file names, S05E03-XDVD-HPEP-LOL-FUKME.avi
This is not 1972 cobol days dudes, if its unlikely to be a hit like friends, stick to one digit to seasons, S3, E03 is ok.
Kill the lame postfix acronyms, except sensible ones not in caps as they take more pixel, (dvd) or (ts) is smaller.
As gordan ramsey says, "you guys a fuking tossers, your a shit head".
TvShowName-S4ep23.avi is nicer, i always rename because they are TOO DAMN long on HTPC systems. Again, this aint 1200bps modem days. (they werent this bad btw)
Oh and another pet peve of mine to your so called elites, stop resizing 720 rips or tv shows to 604 or 624, if its done only to compress better, or
to play on PSP, then why should 90% suffer, 720 original is best on 42in LCDs. Stop resizing because you own a crap 12in crt. Or want to watch tv shows in a psp, cartoons are ok, but not good tv shows. Dont give me this 624 is ntsc in usa shit, only trailer trash own CRTs. If you can afford to download, you own an LCD. If you own a shit tv, well you'll get a better quality any way. Again, read my lips, 624 or 608 sucks 1980s style.
Liberty freedom are no1, not dicks in suits.
Or are they too 1.0 for the kids of today?
I am trolling
I tried using it on our current administration. It showed up as being 29% complete, but unfortunately nobody's seeding the uncorrupted parts that we're missing. :(
For your security, this post has been encrypted with ROT-13, twice.
Funny how you complain about how bad others do your dirty work while you apparently save enough money because of it to watch your favorite shows on expensive hardware.
If there is one thing to be learned on slashdot, it has to be sarcasm.
I have also noticed that the P2P softwares as a group seem to offer excellent features in the area of moving files, large and small, and not corrupting said files, even in high noise/disconnect environments. Its a feature set that should make its way into webbrowers/common-downloaders, but seems to just not happen. Anytime I see a file to download and it is over 300MB, I'm like, "oh-boy this could be an adventure"
:-)
The birthday paradox involves a population in which finding ANY two (or more) of the same is considered a match. That does not apply to a TCP header checksum because the comparison needs to be made against ONE SPECIFIC checksum (e.g. the one the packet in question has). You get a packet and it has a checksum. You calculate a checksum from the data. Do they match or not when the data is corrupted? That's not a birthday paradox.
The birthday paradox DOES apply in cases where you want to create TWO packets with the same checksum, but it doesn't matter which checksum that is. You can create two messages with the same hash in the case of cryptography where there is a weak hash. But in the case of error checking, it's not about creating any pair of matching checksums; it's about creating one checksum that matches one you already have that you cannot change. In birthday terms, it's about finding someone in the population that has the same birthday as you do.
OK, it's 16 bits. My bad. TCP bad. But birthday paradox does not apply here.
now we need to go OSS in diesel cars
I hate to have to be the one to inform you but, contrary to what you believe, this is not Ann Coulter.
The security implications of that have always bothered me.
I wonder, does the current diskutilities app phone home to check the hash? Not that that provides more than a speed-bump for the middleman.
Of course, it is somewhat useful for checking file integrity for issues other than crafted corruption.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Actually, I'm thinking I may have just undone a perfectly good disk swap recently when the problem might have been at Apple's end.
I guess I need to test that disk.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.