Microsoft Wants P2P Avalanche to Crush BitTorrent
pacopico writes "Microsoft seems to think it can be the better Bittorrent. You know faster and more well-behaved. The Register has a story on the P2P work being done by Microsoft's researchers in the UK. Redmond reckons its "Avalanche" technology will be 20 to 30 percent faster than BitTorrent. It's meant for legal downloads only, of course."
Will it block access to MP3 files and a big list of other file-types/filename-extensions? Like MSN Messenger 7 does? But, like MSN Messenger, allow .WMA files? And do this under the guise of "security", alleging that MP3 is an "unsafe" format (though unlike WMAs, MP3s can't launch websites or "acquire licenses" and stuff like that)..
SCO employee? Check out the bounty
Sounds like a combination of BitTorrent mixed with PAR functionality. Being able to generate missing segments from the segments you already have. This actually isn't a bad idea. Probably means more overhead in the download though just like there's overhead in PAR files.
I'm going to laugh my ass off when someone finds a trivial way to defeat whatever DRM MS puts into this to make sure the content is legal, and they get sued for helping distribute copywrited material.
Not laugh because they get sued, but laugh because I can almost guarentee that MS has the money and the lawyers to get off on the "we didn't host it" argument. And in doing so, they are big enough to set precident, and will thus free every other p2p software maker as well.
Of course, how damn amusing would it be if their P2P was used to share...illegal copies of MS products?
Velociraptor = Distiraptor / Timeraptor
Microsoft sees itself running out of runway. It's hard to grow when your market penetration is as high as theirs is. They basically rely on new computer users to help them grow as convincing old customers to upgrade only maintains their last financial position.
They have the ability to enter many other markets all at once, so that's what they're doing hoping they'll stick in a few places. Music is an easy one. This P2P app is also easy because they can include it with Longhorn, release their own patches with it and force partners to use it. Image editing is less likely. They've already been reasonably successful with their Media PC.
You'll likely see them enter a few other new markets this year and next, but they will fail in all but a couple.
The global economy is a great thing until you feel it locally.
I read that in Singapore, the world capital for techo-fascist innovation, trucks would have flashing lights attached to poles on the side of the cab. When a sensor on the engine detected that the truck's speed ever went above 35MPH, the light would start blinking. Then the first police car to see it would issue them a speeding ticket.
If only half the things that I've heard about Singapore are remotely true, then this is one seriously weird place that reasonable people would be wise to avoid.
"Even Microsoft is developing P2P!"
Really, their server products already use a P2P or S2S (Server To Server, servers being each other's peers...) technology for domain replication. Windows 2000 is pretty darn good at replicating its content even when the original copy isn't available.
Of course, YMMV, and the right setup is key.
Get your Unix fortune now!
Microsoft Research has been working on efficient, decentralized, and fault-tolerant P2P systems since 2001. See the paper about their DHT (Distributed Hash Table) called Pastry, which was co-authored with Rice and is still under active development there. Note that the Kademlia DHT, which followed roughly a year later and is now used in a variety of P2P networks (eMule, the new decentralized BitTorrent network, etc.) employs a variant of Pastry's routing algorithm of longest prefix matching.
They still have quite a presence if you look through recent NSDI or IPTPS conferences. Note that this paper is for IEEE INFOCOM, which is big.
- shadowmatter
Actually, if you read the actual research paper, you can see WHY it's faster. Basically, it combines two technologies. A bittorrent like protocol, and file parity generation (such as PAR). This allows you to generate additional pieces you didn't download and reduce the amount of code you need to download by about 20-30%.
This also solves "the last block" problem where everyone is waiting for the last block, since if you have 99% of the blocks you can generate what's left.
It's an interesting approach.
If you need web hosting, you could do worse than here
Exactly. Just like the turds I hear screaming that MS Anti-Spyware is "awesome." They fail to see it's a takee-off of SpyBot/Ad-aware and it's biased to remove whatever MS deems as spyware - RealVNC? Give me a fucking break.
Where's my free iPod!? Until then, I'll settle for a kiss...
Why do you need to guess what it's about when it's all there in the paper linked to by the article? I've skimmed it, gotten the gist of it, and I think their technique is quite clever. And the paper seems to give full details, so anyone can implement it.
Basically, similar coding schemes make scheduling of data in a swarm easier (so there's no choking/unchoking a la BitTorrent, data just flows) and minimize the risk of a file piece being owned by only one peer (if he leaves, downloading is over). These encoding schemes, through linear combinations of pieces using XOR, combat this (I'm generalizing here). The most attractive, I think, are Rateless and Raptor codes, which have similar performance. (Incidentally, the former was developed by Petar Maymounkov, who was actually one of the inventors of Kademlia.)
Anyway, a few months ago I read the Rateless paper, and thought "Gee, I should code this and release it under the GPL... It would be great for P2P apps!" But soon after I finished its implementation, I discovered that all the ideas authored in the Rateless paper were actually covered by patents of Digital Fountain, meaning that Petar's company, Rateless, had to develop a different, proprietary coding mechanism that is outside the patents of DF, and I can't release my code!
So, getting back to my original point, the paper says, "Network coding can be seen as an extension or generalization of the Digital Fountain approach since both the server and the end-system nodes perform information encoding." Meaning that it might not be covered by DF's patents, and thus should be welcomed by the P2P community, and not immediately disregarded blindly by prejudice. I mean, if it's a 20% improvement, why not give it a chance, huh?
- shadowmatter
And in this case, by creating a BitTorrent work-alike, they can draw up patent specs that INCLUDE BitTorrent's features, and then use that patent to shut down the servers. Time to start informing the Patent Offices!
Also, folks, make a note of the DATE of that paper describing Avalanche. One PTO rule that seems to me gets violated often is that there is supposed to be (or used to be) a one-year limit between the public release of an invention's description and the patent application. After more than a year, it's too late to apply. How many existing dubious patents were applied-for too late and could be overturned on those grounds?
Every time someone asks you for a block, you send them a new block, which is a random linear combination of all the blocks you have. This new block will almost always be useful to them. As soon as you get n blocks, where n is the number of blocks in the original file, you can reconstruct the original file. So bandwidth is never wasted sending a block the long way when the short way would do - you squeeze the maximum work from every hop.
The really interesting bit is right at the end, almost as an aside:
"In Avalance we use special sets of secure hash functions that survive network coding operations and consume very little computational resources"
So even though each block is novel, they have a way for the receiver to ensure that it's a real piece of the puzzle. That's a hard problem indeed! So why isn't the solution part of the paper? Are they holding off from publishing that until the patent comes through?
Xenu loves you!
Let's say that you have a bunch of people using BitTorrent. The only people who have segment 499 are behind slow modems. But lots of people want those.
If there's a rare part, you only need one downloader with a decent upstream to break the bottleneck. By the rarest-first scheduling algorithm used in both BitTorrent and eMule, the rarity of segment 499 would have long ago prompted some user with broadband to go get segment 499 from the dial-up user and then start seeding it out to other downloaders, quickly remedying the situation. Besides, with the "penis size varies directly with share ratio" mentality in many BT communities, there will still be quite a few complete seeds once demand for the file builds up.
I read the paper too! They state 2-3 times speedup over BitTorrent for badly connected networks.
Recovering the original file is tricker than it looks though...
They state that they have to invert a matrix of O(nblocks^2) to recover the original file. This takes O(nblocks^3) operations. Since there is only 1 bit per entry that will fit into memory and won't be a problem. There are plenty of ways to invert matrices faster than O(nblocks^3) too.
They then have to undo the linear equations which is an O(nblocks^2) operation. However each of those blocks is a block of the original file. If you have a 4GB file (say) broken up into 4,000 1MB blocks, you'll need to do 16,000,000 x 1 MB operations, ie 16,000,000,000,000 bytes of operations which takes a while even at L1 cache speed! If you haven't got 4GB of ram, thats going to cause an awful lot of disk IO
I guess you'd allocate the largest buffer you can, and run through the file file_size / buffer_size times. Since file_size / buffer_size probably isn't huge 10 or so (4 GB / 400 MB say), then you'll only have to do 40GB of IO to tidy up at the end. With a 40 MB/s disk that should take 15 mins or so. Not insignificant, but quicker than network IO probably!
Every man for himself, all in favour say "I"