Slashdot Mirror


Use BitTorrent To Verify, Clean Up Files

jweatherley writes "I found a new (for me at least) use for BitTorrent. I had been trying to download beta 4 of the iPhone SDK for the last few days. First I downloaded the 1.5GB file from Apple's site. The download completed, but the disk image would not verify. I tried to install it anyway, but it fell over on the gcc4.2 package. Many things are cheap in India, but bandwidth is not one of them. I can't just download files > 1GB without worrying about reaching my monthly cap, and there are Doctor Who episodes to be watched. Fortunately we have uncapped hours in the night, so I downloaded it again. md5sum confirmed that the disk image differed from the previous one, but it still wouldn't verify, and fell over on gcc4.2 once more. Damn." That's not the end of the story, though — read on for a quick description of how BitTorrent saved the day in jweatherley's case.

jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."

11 of 212 comments (clear)

  1. !new by gustgr · · Score: 2, Insightful

    For heavy BT users this tactic is very common, provided the file(s) you are willing to download is fairly well available from different sources.

    1. Re:!new by SanityInAnarchy · · Score: 2, Insightful

      It's an older concept than that, even. Goes back to the strange Debian habit of using a tool called Jigdo -- it would provide essentially a recipe for building an ISO out of all the files needed, where the files were mostly available from standard Debian mirrors. ISOs were available from far fewer mirrors than standard Debian packages, you see.

      So, you'd use Jigdo, and if all went well, it'd assemble a working image. But if a few packages couldn't be downloaded, you could always take your mostly-complete Jigdo file and use rsync with an rsync-capable mirror. (Or, more recently, BitTorrent on Ubuntu -- but that's another story.)

      I don't think this tactic is very common, though, as most people seem to have no fucking clue how BitTorrent works. I've seen torrents with gigantic multipart RARs, with an SFV of those. Let's see... so, my torrent software is already checksumming everything, and RAR has a builtin checksum too, or at least, acts like it does (it says "ok" or not) -- and on top of that, there's an SFV checksum (crappy CRC32), too. Never mind that RAR saves you at most a few megabytes (video is already compressed), which, based on the size of these files, you'll spend more time unpacking the RAR than you would downloading the extra couple megs. Or that, once you unpack and throw away the RAR, you can't seed that torrent from the working video. Or that multipart anything is retarded on BitTorrent, as the torrent is splitting it into 512k-4meg chunks anyway.

      Whoops, end of rant. Oh, by the way, that wasn't about me, it was about my friend. Wink wink.

      --
      Don't thank God, thank a doctor!
  2. Re:What broken software were you using? by Dice · · Score: 5, Insightful

    I asked the same question. Wikipedia answered it.

  3. Re:What broken software were you using? by kcbanner · · Score: 3, Insightful

    Its networking - shit happens. Some of his bits got thrown out of a router somewhere as heat, or maybe a packet timed out and didn't quite make it.

    --
    Obligatory blog plug: http://www.caseybanner.ca/
  4. Re:Nice by empaler · · Score: 5, Insightful

    I assume you then continued seeding? :)

  5. Re:Nice by CastrTroy · · Score: 2, Insightful

    Reminds me of the "PARS" I used to get off usenet. I think it was bacally a RAR split up into hundreds of pieces, with parity information in each of the files. You only needed to download a certain percentage of the files to reconstruct the original file. It was great, because often pieces of the file would go missing, or become corrupted somewhere along the way.

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  6. Re:simpler home-brew technique by Just+Some+Guy · · Score: 2, Insightful

    The person with the bad file runs option 1 to make the check file and sends that to the person with the good file. They run option 2 which identifies bad chunks and exports them, which they send back to the first person. Run option 3 and the exports are patched into their download and it's fixed.

    Isn't that almost exactly how rsync works?

    --
    Dewey, what part of this looks like authorities should be involved?
  7. Re:Nice by Hes+Nikke · · Score: 2, Insightful

    Done this with RAR archived stuff as well. (Multipart rars on torrents are retarded, but that's another issue entirely.) any idea why the multipart RAR torrents tend to have healthier swarms than single file torrents of the same content? it pisses me off!
    --
    Don't call me back. Give me a call back. Bye. So yeah. But bye our, well, but alright we are on a shirt this chill.
  8. Re:What asshole tagged this '!news'? by Free+the+Cowards · · Score: 2, Insightful

    To be honest, when I saw this story I was shocked it had shown up. I thought that using BitTorrent to repair mostly-whole files was obvious for this crowd. It's like "Using Water to Nourish Your Plants" showing up on a horticulturist site. If you know anything about how BitTorrent works then you should immediately realize that it will fix up mostly-good files for you.

    The subsequent discussion has revealed that a large chunk of the slashdot population not only doesn't understand how BitTorrent works but doesn't even know about classic open source tools like rsync.

    --
    If you mod me Overrated, you are admitting that you have no penis.
  9. Re:Nice by Jurily · · Score: 3, Insightful

    Me too. But I never thought about the endless possibilities here.
    Just ship everything with a .torrent to verify.

    (Wow, all the authorities we could annoy with one minor change!)

  10. Re:Nice by operagost · · Score: 4, Insightful

    Any modern file system will fragment if you expand an existing file. It simply has no way to guess how big the file will get when it is created unless your application chooses the proper allocation.

    To give you an extreme example, imagine a 100 GB volume which has no files. You create a 1 MB file, and your filesystem places it near the top. Now you create a second file, and your filesystem places it... well, it could place it anywhere except that first 1 MB, so let's say it places it right next to the first file. Uh oh, it turn out that you need to write 1 GB of data to that first file and extend it. Now you have two fragments.

    Ok, let's assume our file system is magical and knows that you like to extend files to huge sizes. So it places the second file at the end of the disk, instead. Oops, you fooled you file system: this time, you wanted to extend the second file by 1 GB. There is no room to append to the end of the file, so a second extent is created somewhere else and linked to the second file. You have two fragments again.

    This is why performance tuning requires that you anticipate data requirements and allocate space accordingly; for example, by setting the initial size of database files to one that should reasonably accommodate the data requirements for the foreseeable future (and not automatically shrinking the database down when records are deleted).

    --

    Gamingmuseum.com: Give your 3D accelerator a rest.