Microsoft Wants P2P Avalanche to Crush BitTorrent

← Back to Stories (view on slashdot.org)

Microsoft Wants P2P Avalanche to Crush BitTorrent

Posted by ryuzaki0 on Thursday June 16, 2005 @08:05AM from the bet-it-won't-be-open-source dept.

pacopico writes "Microsoft seems to think it can be the better Bittorrent. You know faster and more well-behaved. The Register has a story on the P2P work being done by Microsoft's researchers in the UK. Redmond reckons its "Avalanche" technology will be 20 to 30 percent faster than BitTorrent. It's meant for legal downloads only, of course."

10 of 545 comments (clear)

Min score:

Reason:

Sort:

Alright! by qw(name) · 2005-06-16 08:06 · Score: 3, Informative

The ultimate in spyware!!!
Distributed PAR2 by Ececheira · 2005-06-16 08:13 · Score: 4, Informative

The way the Register describes it, it appears that rather than sending out chunks of the actual file, it's sending out something similar to PAR chunks where once you have enough data, you can reconstruct the original file.

Futher, with a few chunks, you can calculate new chunks to send over to others, that way more people have access to more of pieces of the file.

Sounds interesting, I wonder if it'll be incorporated into the next version of BT.
1. Re:Distributed PAR2 by Anonymous Coward · 2005-06-16 08:28 · Score: 1, Informative
  
  If you could calculate new chunks from the ones you already have, wouldn't it be faster to omit the calculable chunks and have everyone download the smaller file faster? I mean, isn't that the basic theory behind file compression?
  
  If a file was originally 10 chunks, you add 2 more chunks. Then anybody that has 10 out of any of the 12 chunks can get the original 10. Since you now have the original 10 chunks you can recreate the additional 2 chunks so the next guy can get any of the 12 from you. The idea being that if two different people only have 9, hopefully they won't have the exact same 9 chunks and will be able to complete each other and then provide the 12 to everyone.
2. Re:Distributed PAR2 by MikeBabcock · 2005-06-16 08:33 · Score: 3, Informative
  
  Its mathematically impossible to do this with less data than an original already-compressed stream.
  
  PAR data is additional redundant data to allow reconstruction of files for which not all the original blocks are any longer available.
  
  This is a *real* problem in some cases, mind you, but it requires sending *more* data, not less.
  
  The additional data is either padded onto each block (as they describe it) or as additional blocks (the way RAID5 or PAR works). Either way, you're talking about having *more* data on average.
  
  If no seeds become available *and* all the available peers do not combined have all of the blocks you each need *and* the blocks that are present are sufficient to reconstruct (from their redundant bits) the missing blocks, this becomes useful.
  
  --
  - Michael T. Babcock (Yes, I blog)
3. Re:Distributed PAR2 by Wesley+Felter · 2005-06-16 08:38 · Score: 3, Informative
  
  Bram won't add FEC to BitTorrent because he's not convinced of the benefits in real-world situations. (Like most papers on this subject, Avalanche omits a lot of real-world details.)
4. Re:Distributed PAR2 by ChadN · 2005-06-16 09:40 · Score: 4, Informative
  
  A simple example for those reading who don't understand, then some follow up comments:
  
  Say I have bits 'a' and 'b', that other people want.
  
  I could sent bit 'a', then bit 'b' to receiver FOO, who can pass them on to others. However, if I send bit 'a' first, and others want 'b', they have to wait.
  
  Now, instead of transmitting to FOO bit 'a' then bit 'b', I send to FOO ('a' XOR 'b') first, then either bit 'a' or bit 'b'. I'll end up sending FOO the same amount of information (assuming the order is specified in the protocol itself).
  
  BUT, and here's the cool part. If someone already has 'a', they can get ('a' XOR 'b') from you, and complete their set of data (bits 'a' and 'b'). Furthermore, if someone already has 'b', they also get ('a' XOR 'b') from you, and complete their set. So, by only downloading 1 bit, instead of 2, you can complete the set for others who already have one or the other bits.
  
  Now, in practice it'll get a lot more complicated, and the method presented in the paper is not exactly like I describe, but the idea is that you can send data to help people complete their data sets, even though you yourself do not yet have the actual uncomputed data. Instead, you have a computed function of the data, which others can use immediately, and from which you can reconstruct the actual data later when you have more information.
  
  The practical upshot is that the computed data is more valuable to other peers than the uncomputed data, as they may be able to use it to complete their data set, rather than wait for the remainder of the uncomputed data.
  
  So, in reference to your comments, it may not be so much more practical to any one receiver; they still need to wait for all the data, in either computed or uncomputed form. But, for the network as a whole, it means that each receiver has many more options from which to download and compute each chunk, and thus make available to others. It is not hard to imagine that this can benefit the overall throughput of the network (which the authors of the paper claim).
  
  --
  "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
Re:Better? No. by Alioth · 2005-06-16 09:17 · Score: 1, Informative

TANSTAAFL. If the file is already well compressed, generating blocks from parity information won't make it faster - because there's no more redundancy to be squeezed out, you have to transfer more data than the raw file to be able to have the extra information to generate these blocks. Sure, you may be able to get it slightly faster, perhaps 0.5-1% - but certainly not 20-30%, unless the file is uncompressed or only lightly compressed.

Most files that go out via BT are very well compressed.

--
Oolite: Elite-like game. For Mac, Linux and Windows
Re:Can we stop... by ajs · 2005-06-16 09:46 · Score: 3, Informative

Not shocking. The folks as MS Research are actually extremely bright, and often given a rather long leash. It's Micrsoft the software company that usually permutes the fruits of MS Research into the crash-freindly pablum that we've become all to familiar with.
Re:Better? No. by StikyPad · 2005-06-16 09:47 · Score: 2, Informative

This also solves "the last block" problem where everyone is waiting for the last block, since if you have 99% of the blocks you can generate what's left.

Not really, it just (possibly) changes the nature of the last block. .PARs don't require any less data to be downloaded, it's just that you can substitute parity data for the original data, then do whatever transformation on that to get the original data back. If the file you're trying to get it 1GB, you're still going to need to download 1GB, whether it's 100% original data, 80/20, or anywhere in between.

The only thing this really helps is if clients prioritize the parity data and then all seeds disappear, although it's of very limited use there as well, since the data shared between the remaining peers still needs to total 100% of the file size.

--
https://www.eff.org/https-everywhere
Re:Better? No. by rips123 · 2005-06-16 12:27 · Score: 2, Informative

Here's where I call BS: "20-30% faster."
I don't know if its BS. I actually read this paper last week as network coding is an area related to my field of study and it seems pretty legitimate. The paper actually claims much larger increases compared to uncoded transfers in cases such as networks made predominately of slow nodes with infrequent well-connected nodes.

The technique is actually pretty neat. They form a set of linear equations of the form:

ax_5 + bx_4 + cx_3 + dx_2 + ex_1 = g

where a,b,c,d,e,f are chosen randomly and x_n represents the data to transmit. They then send the coefficients and result a,b,c,d,e,f,g to other nodes.

With a block size of n, you typically need n sets of such coefficients (assuming they're linearly independent) to recover the original data.

This basically makes the rarest block problems of bittorrent irrelevant assuming the server has seeded a little more than n data blocks.