UDP + Math = Fast File Transfers
Wolfger writes: "A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..." Well, the information is transmitted, just not in its original form. I think there are a couple of other places working on similar ideas - wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.
User would go to download a game demo or something, receive pieces from several different places, and knit them together?
The file sharing networks based on fasttrack technology do that. You download a movie or game from different users at the same time. Kazaa stitches it back together.
The name of the program michael is referring to is called getright, which can connect to several known mirrors of a file and download seperate fragments from each.
GetRight uses multiple sites to download from and then pieces them back together.
http://www.getright.com
The Transporter Fountain sits alongside a switch or router, and one Transporter Fountain is needed at the sending and receiving ends of a connection. Prices will range between $70,000 and $150,000.
Oh, boy, I'm gonna stop by CompUSA on the way home and grab one of these.
I wonder what equations are used to convert raw unpredictable streams of data to formulas, and how come that the formulas used aren't bigger than the sent packets themselves? They mentioned XOR, but that just sounds silly, because XOR does nothing with data except do some reversible equation on them which does neither shrink or grow data.
Does anyone have more info? It does sound interesting though...
In essence, is not this the same as file compression? The amount of information is the same
(for those, who remember, what Bit is). It is just, that the usual one character per byte is awfully wastefull. Which is why the various compressors are so effective.
Add a modern data transfer protocol and you may :-)
get some start up money
In Soviet Washington the swamp drains you.
You still need some form of flow control or rate limiting, otherwise a large percentage of the UDP packets are going to get dropped. Plus, you have the problem of UDP streams stealing bandwidth from TCP streams on a limited bandwidth link.
Mea navis aericumbens anguillis abundat
I mean it's not for image compression specifically, but it definitely reminds me of IFS image compression in some ways. I'll bet that compression is very time consuming, but that's fine if you're warehousing data. I wonder if the clients are pre-loaded with a body of parameterized functions, so that the server just sends information describing what functions to run and what the parameters are. I guess if it's all based on polynomials all it needs to send are vectors of constants.
Neat idea. Patents: here and here.
> These files routinely are mailed on tape rather
:P Maybe they meant 20gb. Cuz I've seen chip designs + noise analysis + whatever take dozens of gigs.
> than transmitted electronically. "FedEx is a
> hell of a lot more reliable than FTP when
> you're running 20 Mbytes,"
Having worked in the industry they mention, I'd hazard that they don't use ftp more because of the illusion of security than anything else. People in the EDA world (which is where I worked, and has a close relationship with chip manufacturers) are immensely paranoid about people getting ahold of their chip designs, because if someone steals that.. you not only lose your next chip, you enable someone else to make it for you.
These people just don't trust firewalls and ftp yet, but they do trust putting a tape in an envelope and snail mailing it. At the very least it makes someone liable if the letter gets stolen, which you can't do with electronic transfers..
At any rate, ftp is plenty reliable for transfering 20mb files.. I do it every time a new game demo comes out.
"User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name."
Uh...doesn't something like Download Accelerator Plus (yeah yeah, I know its a hive of spyware) already do that (downloads from multiple locations only to recombine the file later)?
Guys, this is nothing like Kazaa. Kazaa will let you download from several sources simultaneously, but only because it just requests different parts of the file from each source. At that point there are still send/ack type protocols in use.
This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples. By changing to a different domain, it is possible to send metadata from which the original information can be recreated.
I have no idea exactly how they would do it, but it doesn't sound completely impossible.
However, its nothing like Kazaa or GetRight.
I believe Michael's talking about OpenCola's Swarmcast.
The company's name is OpenCola and the name of the product was SwarmCast. The guy who did SwarmCast, Justin Chapewske, is now at a company he started named Onion Networks. OpenCola appears to have completely abandon its original Open Source approach to their software.
Apparently, Justin has taken the GPL portions of Swarmcast and is improving them at Onion Networks.
Need a Python, C++, Unix, Linux develop
Debian does something similar with the Pseudo Image Kit.
It gets all the parts of the install ISO cd image, from disparate sources, stitches them together, and then uses rsync to patch it to exactly make a duplicate of the original install image.
Very nifty.
The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.
This is called compression. Everybody is doing it and it has been done before.
When you download a ZIP file, you are not downloading the content. You are downloading a mathematically transformed version of it. You then translate it back. Modems have been compressing and decrompressing on the fly since the late 1980s.
Maybe they have a better compression scheme? (Fractal based?) That would be news. Everything else is a distraction.
--- -- - -
Give me LIBERTY, or give me a check.
Oops, make that Justin Chapweske. That's what I get for typing out an odd name from memory. :-)
Need a Python, C++, Unix, Linux develop
For anyone that just wants the jist of the article:
"The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. "
So basically, it's not much more than UDP with a single reply telling the server to stop transmitting.
Not bad, but you better have some good timeouts worked into this thing. UDP by definition is a non-replying "if it gets dropped who cares?" protocol. If the receiver's connection were to go down, wouldn't the server just get flooding all the in-between routers with packets for awhile? That's not good for traffic congestion.
------
Let me give you the lowdown
This could be done easily without the proprietary algorithms. Just send update packets with a header in each on stating that it is packet number N and there are X total packets. Then, request missing packets when you get towards the end, and put them all together in order when you get them all.
Somewhat unrelated --- Does anyone else miss Z-Modem. We need a zmodem like program for that works over telnet so we don't have to open a separate FTP session. In the BBS days, you just typed rz filename and it came to you.
Judging from this:
The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.
It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. Apparently there are multiple solutions to the equations that will arrive at the same answer, so when the receiving end has received enough symbols to make it works it says, "stop sending already!" Apparently they're getting their speed because A) things don't have to go in any order (that's how the 'net is supposed to work, right?) and B) Alice and Bob don't have to keep up this conversation: Alice: Hey, Bob, can you send me X? Bob: Okay, are you ready? A: Yes, Go ahead? B: Okay, here it comes. A: I'm waiting. B: Here's the first packet." A: What? That packet didn't make it over. B: Okay, here it is again. A: Okay, I got that packet. B: Good. A: Okay, I'm ready for the second packet. B: Okay, here's the second packet.
Okay, I had too much fun with the Alice and Bob conversation there. Anyway, it looks like there scheme is compressing things in the form of their equations, and then just sending them in a burst until the receiver is happy.
Sounds like it might work, but it'll generate a ton of network traffic, I'd bet!
Someday when we all have extraordinarily fast computers, we'll simply be able to send somebody an MD5 sum and the computers will be able to "crack" it back into the original file. At that point, commercial software wouldn't even have to come on a CD... just print the hash on a slip of paper and the user could type it in.
word.
Skiers and Riders -- http://www.snowjournal.com
Yeah, I don't ftp is so slow that anyone is going to pay $70k for their proprietary 'Transporter Fountains'. Seems like anyone with a little common sense and math ability could easily cobble together a software UDP based transfer protocol that has all of the properties described in the article.
The key is to build in redundancy without increasing the amount of data sent so much that you counteract the speed gains you get by using UDP.
-josh
Just send restart at commands to many different servers, then cat the files onto each other. This is how Dowload Accelerator does it, and Fast Track is the same theory. Programs just take all the mental work out of it.
SIG: HUP
The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..."
This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed. The only time you get a 1 for 1 ack ratio is if you miss a packet and the window slams shut.
Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's. Frame Relay switches may drop packets if you exceed your CIR and begin bursting, so that whole transfer will never succeed. Therefore you actually waste bandwidth cause the whole transfer is doomed to fail, and the sender will never know it.
Also some routers have WRED configured in their queues, purposely dropping TCP packets to increase bandwidth on a global scale. This would damage the file transfer process as well if it was UDP based, as this system is.
Stick with the RFC's and the tried and true TCP transport system. This company will fail.
--- RFC 1149 Compliant.
This sounds a lot like what PostScript is to a rasterized file. A set of descriptions of what the picture looks like, which are small and easy to transmit, which are then drawn to produce the picture.
With real vector PS its easy, since you start out by creating vectors (eg, Adobe Illustrator). How you get from a non-vector "destination" to the metadata you really want to transmit sounds complicated.
Udpcast in FEC mode does this too: in addition to the original data, it can transmit an arbitratry amount of "FEC" blocks which are a linear combination of the data blocks. If some data blocks are lost in transit, udpcast can recalculate them from the FEC blocks by multiplying the vector of received data by the inverse encoding matrix.
Say no to software patents.
Conclusion: Only sales & marketing would try to sell a product that turns 20MB into 100MB, sends it via UDP, only in order to have the results XOR'd together.
Where do they get these people?
Oh, yes, here's a link to the SourceForge project for SwarmCast.
Need a Python, C++, Unix, Linux develop
I don't have the paper in a computer format, but the number is TR-98-021 and John and Michael were both at Berkeley at the time (1998), so it should be fairly easy to find if someone is interested. Doubtlessly, a number of other reports on the subject should also exist that deals with the same problem but with different solutions.
Sorry, the whole article seems to make some magic mumbo-jumbo out of the process. Apparently the file is transformed, but how does that transformation help? The main difference between UDP and TCP in this case is, that TCP maintains the sequence of Packets, so after splitting a file up, sending it as TCP-Packets and combining it again, all parts (sent as Packets) are in the right place. UDP does no such thing, and also UDP doesn't check, if a packet really reached it's destination. This frees UDP of some overhead TCP has. But to send a large File (with a simple approach), now you have to label each UDP-Packet with a sequential number, and, at the end, check if all Packets arrived (and maybe request missing Packets again), then rearrange them according to the sequence numbers.
Now i don't see, how a transformation of content helps here, instead of adding the information where in the file the packet goes (a kind of serial number), now you have to label, where in the equation it should go (a kind of coefficient index), so the receiving end knows, whether it has all information, and which information is still missing, and must be requested again.
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
It's an encoding scheme that sends you the instructions on how to build something rather than the stuff itself. Not so special as they make it sound. Saying that you get the data without it being sent to you is the biggest troll for mid-level clueless managers that want to download their "repr0ns" faster. Not that I'm even sure it will work that well....
SIG: HUP
You create your data in records and groups. Each group contains a longitudenal XOR of the other records within the block. This comes from tape backups that were proof against bad-spots and was later used in RAID.
You then sequence and shoot your data in records across the network. If one record is dropped, it can be recreated through the XOR redundancy record. If two records are dropped, you need a rerequest mechanism. This can be either on UDP or via a separate TCP link.
If you want an example of prior art, go to the trading room of a bank. How do you think all those prices are delivered to every workstation?
See my journal, I write things there
michael, this is not what the product does. From the article:
It appears as though the singal is broken down into equations, that when combined produce the original data. These equations are all sent from the same server to the destination client. The speed increase then comes from the fact that the size of the equations is less than the size of the data.
The article does not mention that the equations come from multiple servers, which is a very big difference! IMO, this technology is much more newsworthy than yet another multi-server downloading tool like Kazaa.
----- rL
The EETimes article is extremely poorly written.
.. it just continues listening. This is especially cool for multicast transmission since even if receivers A and B miss different parts of the message, the broadcaster doesn't have to send the missed parts of the message to the different receivers - it just continues broadcasting since any part can substitute for any other part.
The technique used by Digital Fountain is called Forward Error Correction. It allows a message M with m parts to be encoded into n parts, where n > m. The interesting thing about this is that any m of the n parts will allow the original message M to be reconstructed.
This means that if a part of the message is missed, the receiver doesn't have to request a resend
ok, people, you seem to be far away from your thinking apparatus.
compression takes data, applies an algorythm to it to generate new data that is representation of the whole.
contrary to that, this transforms the data to an algorythm that simulates the whole.
it is a minor point of order, but one worth thinking about because it is theoretically different.
this is also not distributed, it is point to point, I don't know where the submitter got the distributed model from, I suspect he pulled it out the air, but this is very clearly not that. It requires a machine at each end of the point2point.
however, logically, all data that is not random, can be represented by equations that are smaller than the data. However, the real load exists in the generation of the equations from the data, and not the reconstruction of the data itself, so for me this seems to be quite a possible project, though i suppose it will take quite a bit optimization.
one other thing, the article does not say that it uses a standardized technique, and it would be interesting if they did not use standard vector analysis or the like. If they used vectors, then this could just reduce to a system of file conversion from one standard to another. I think it would be far more interesting to be what it says it is supposed to be a file conversion to mathematical simulation of the file.
Maybe I'm misunderstanding the quote, but I semi-regularly download 600+ MB iso images and other multiple-hundred-meg files over ftp links at work, and I've never had problems. It's probably just me, but that really sounds like a load of bull, designed to whip up interest in an otherwise lack-lustre product (given it's high price).
Cheers,
Tim
It's official. Most of you are morons.
See http://www.digitalfountain.com/getDocument.htm/tec hnology/DF_MetaContentWhitePaper_v4.pdf
http://www.ietf.org/internet-drafts/draft-ietf-rmt -info-fec-01.txt
The use of Forward Error Correction in Reliable Multicast
Enjoy.
The technique is called Forward Error Correction. I don't know much about it, but I know that you can do things like break up a file of N*B bytes into 2N blocks of B bytes each and then be able to reconstruct the file from any N of the 2N blocks. The GPL'ed Mojonation system uses it, if I recall correctly, to distribute each 32kB chunk of data into eight 8kB blocks allowing reonstruction from any four of them.
Those "Tornado Codes" you mentioned are coauthored by one of the executives of this company (Luby is CTO).
After looking through some of the material on the company's web site, this product appears to be based on LT (Luby Transform) coding. Each encoded packet is the result of XORing a random selected set of segments from the original file. When sufficient packets have been received, they can be used to reconstruct the original file. Insert magic algorithm. The nice thing about this is that the transmitter can continually stream packets, and a receiver can start collecting packets at any point in time. When the receiver has collected sufficient packets, it can reconstruct the original file. Packet ordering is totally irrelevant. You just need enough packets to generate a unique solution. The math for the code has not been published yet, but this is supposed to be a successor to tornado codes, which have been described in the literature.
Mea navis aericumbens anguillis abundat
Since it doesn't use TCP, I'll bet it won't have any problem handling the TCP part of "TCP/IP connections"... The networking end sounds really simple - send the amount of data to expect, wait for confirmation, send all of the data once *without* waiting for confirmation until it's all been sent. That's a lot easier than handling the overhead of tcp, which is the whole problem they're trying to solve.
:) There's a nice little demo at http://www.digitalfountain.com/technology/coreTech nology.htm
BTW, it helps to read the article before posting "insightful" comments.
I wonder how hard it would be to highjack a UDP based session like this. What if bogus packets are injected along with the stream of valid ones. Does the math include any form on encyption? Or is this a tunnel for other protocols? Damn it, we need to move away from clear text protocols, not create new ones!
With the GetRight solution, 100 people downloading from my FTP mirror = 100 TCP streams worth of bandwidth and CPU consumption. With the Digital Fountain solution, 100 people downloading from my mirror = 1 stream worth of bandwidth and CPU consumption.
They called it "Segmented Downloads", ie the program would hit multiple ftp sites for the same file, doing a resume to get different parts at the same time. Heck, it would even locate the mirrors for you.
And yes, it caused a substantial improvement in download speed. It seems thus that the bottleneck is seldom in the last mile.
But this has little to do with the article, which as far as I can tell is mostly gibberish.
My Karma: ran over your Dogma
StrawberryFrog
The "Math" they use is called Forward Error Correction (FEC) and is the same stuff that the Swarmcast distributed download system is based off of (http://sf.net/projects/swarmcast/).
I am the creator of Swarmcast, and we at Onion Networks (http://onionnetworks.com/) already have a product that can provide file transfers just as fast as Digital Fountain, but ours is 3-5.5% more efficient and much much cheaper.
On the open source front, we are working on the next generation of the Swarmcast technology which will combine parallel downloads over HTTP, Multicast IP, and UDP (for tunnelling through NAT). We have started work defining a standard for this system called the "Content-Addressable Web" and hope to see it implemented in everything from Apache to Mozilla.
Please mod this up, people shouldn't be paying $150k for these types of technologies.
You can only solve a linear system of equations with two unkowns and two equations if the equations are independant. And you can't have more independant equations than you have variables. So your calculation is very optomistic. Imagine if I send you the following data:
X+Y=4
X+2Y=5
2X+4Y=10
2X+2Y=8
With 50% packet loss (as opposed to a 50% chance of loss per packet*) you only get two useful equations 2/3 of the time. Of course, these boxes are expensive and can do a lot more number crunching than just linear algebra. But of course, I wouldn't really have expected a good comparison from a company whose veep of marketing said "FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes" Clearly all those game demo sites on the net need to get with the program.
*Computing the chance of loss per packet gives:
1/16 chance of getting all four, 100% success
3/16 chance of getting 3 of 4, 100% success
8/16 chance of getting 2 of 4, 66.666% success
3/16 chance of getting 1 of 4, 0% success
1/16 chance of getting none, 0% success
or 28/48 is a little more than 4/7, slightly worse odds than if you always got 2 of 4.
I expect this system is a little like Raid 5, Used on Hard Drives. Eg, 5 disks, 1 goes down, you still have enough data to restore the failed drive.- This seams simalar to been sent 5 Million UDP packets, 1Million get lost on the way, however, you still can piece together perfectly what you wanted from the remaining 4Million.
All the data doesnt need to come from the same server. I suppose it is like load balancing.
If you have 20kbps to mirror A, and 80kbps to mirror B. Server A and B start spamming you with packets of whatever you are downloading. Then 20% of the total packets you recieve are from A, and 80% of the total are from B. Given all the partial packets you got, you can now reconstruct the porn DivX. This would be faster then getting 100% of the packets from B which would take 25% longer.
There is nothing "proprietary" here. The techniques for reliable transmission of digital data over unreliable media has been a central area of EE research for at least three decades. The unreliable media is now UDP instead of broadcast RF or transmission lines.
Line reliability in the "normal" sense is classified by bit error rates. Here, the analogous rating would be packet dropped per second. So, it seems like a straightforward application of FEC. It is useful for the reasons that they state. TCP transmissions look like this:
Client: Request Data Packet #00001
Server: Transmit Date Packet #00001 then wait for Client to say OK or Retransmit.
Client: Request Data Packet #00002
Server: Transmit Date Packet #00002 then wait for Client to say OK or Retransmit.
....
Client: Request Data Packet #20343
Server: Transmit Date Packet #20343 then wait for Client to say OK or Retransmit.
DONE
The FEC/UDP form would look like this:
Client: Request Data FILE
Server: Transmit FEC Data Packet #00001
Server: Transmit FEC Data Packet #00002
...
Server: Transmit FEC Data Packet #20343
Client: OK...I got it. Stop Transmitting.
DONE
There is no per-packet dialog between server and client. This removes network latency from the transfer equation. Note that FEC over UDP does require redundant information to be transmitted, so there is no free lunch here, but it is certainly likely to be faster than any TCP/IP implementation.
Oh yeah, and you can download our completely patent-free and open source FEC library from here and build your own Multicast or UDP based download system very quickly (provided you get the flow control right :)
--
Justin Chapweske, Onion Networks
Multicast is congestion friendly in that it can inherently seriously reduce bandwidth consumption, but it's obviously not congestion adaptive. I think the easiest (and probably best) way to introduce congestion control in a multicast is to have multiple streams at different rates, e.g., you have five simultaneous multicast webcasts of a Victoria Secret show, and folks can join the mulitcast group with the video quality that best suits the bandwidth and congestion on their pipes. This idea works very well for the Digital Fountain-style software distribution: rather than having a server serving the same stream at 10 different rates, you can have 10 servers streaming their own unsynchronized Tornado encodings, and clients can subscribe to however many their pipes can support. With 10 streams of data coming at you, you can reconstruct the original data ~10 times as fast.
OK folks, here is the "real deal."
Digital Fountain's core technology is called "meta-content (tm)". The meta-content engine produces packets based on the original content, such that if the receiver receives enough packets (just slightly more than the size of the original content), the original content can be recreated. The neat part is that it doesn't matter which meta-content packets you get. If you need to receive 10 packets, you can get 5, miss 5, get another 5, and it works. Or you can get 1, miss 10, get 9, and it works as well. As long as you receive some 10 packets from the "fountain," you can recreate the original content.
Why is this cool? Several reasons. Digital Fountain claims that TCP connections with RTT of 200ms and 2% packet loss have a bandwidth limitation of 300kbps, no matter how fast the actual transport channel is. So you just go to town to full bandwidth with UDP to use up the entire channel, and use Digital Fountain technology so it doesn't matter which 2% of packets get lost, just as long as you transmit enough packets to make up for the packet loss.
OK, why else is this cool? Imagine a Digital Fountain continuously transmitting meta-data on a multicast address. If you want to receive a file, just listen to the multicast address. It doesn't matter when you start listening, just as long as you listen for enough packets to recreate the original file. Multicast file distribution.
Interestingly enough, Digital Fountain has also pioneered multicast on-demand streaming, but the real secret sauce there is something besides meta-content, but meta-content makes it easier.
As some people have mentioned, you can use UDP with FEC to achieve some error correction. But meta-content can handle long burst errors, whereas FEC is only appropriate for short, random errors. You can literally unplug the ethernet, wait a while, and plug it back in, and you're still good to go with Digital Fountain, as long as you listen long enough.
I should mention, DF has something called "Fair Layered Increase Decrease Protocol," or FLID, to keep their UDP burst from blowing away other TCP traffic on the network.
For more information on the basic Digital Fountain technology, see: A Digital Fountain Approach to Reliable Distribution of Bulk Data.
The concept of "sending equations" instead of data is extremely well-known. It's called Forward Error Correction (FEC). FEC is a very simple idea: take the source data and encode it with parity data so that you can reconstruct the original message from any N chunks of it. One form of FEC that even your stereo components might already do is Reed-Solomon encoding; you can look this up in CS textbooks. If you Google for "Luigi Rizzo Vandermonde", you'll find a fast, free C library for FEC that you can use in your own applications.
Digital Fountain was started by Mike Luby, who is something of a luminary in information theory/cryptography. The kernel of their company's IP is "tornado codes", an FEC codec that is leaner and faster than competing codecs. The basis for their protocols, last time I checked, is FLID/DL and RLC. These protocols set up multiple channels (or "sources" if you must), each transmitting random chunks of FEC-encoded files.
The drawbacks to FEC are that it can take a long time to encode data for transfer, which makes FEC hard for some real-time applications like mail, and that the amount of data transferred is going to be some percentage greater than the original file (owing to parity information). But the drawback to FEC file transfers protocols is much more significant: they aren't TCP-friendly.
The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?
More to the point, what's the benefit? There's obvious payoff for FEC in multicast, where backtraffic from ACKs and NAKs quickly overwhelms the group and kills performance. But in unicast-world, where we will all be living for the next decade, TCP and TCP-friendly forward-reliable protocols like SCTP already provide good performance.
Slow news week at EETimes I guess. Or Digital Fountain just hired a PR firm.
For example, lets look at a situation where I need to tranfer data from the east coast to the west coast and my round trip time is 70 ms. If I have a 32 Mbps link, I can send 64K in about 2 ms. So I have to wait 68 ms until I send another 64K. This gives an effective throughput of less than 1 Mbps. Throw in the slightest amount of congestion and things go even further downhill real fast.
Thank you for that detailled explanation. It seems that my shit detector is a little trigger-happy. :-) Your comment on the "magic algorithm" is spot-on. I could think of a way to pre-determine an algorithm which recodes 64k chunks of data by precomputing all possible 64k datastreams and then simply selecting the datastream which represents that 64k (or whatever size). Anyway the other comments in this article have already attacked the idea on its use of UDP and its lack of net-friendliness. I predict a number of angry customers who paid $300k for a pair of these and whos data is getting dropped/throttled like crazy by their ISPs or the ISPs in the middle. :-)
Yup, reading a few answers here i'm already reading up on FEC (why didn't the EETimes mention this?). The nifty thing about this is, that it's also a nice way to store information in a distributed way (think freenet, gnutella, ...) as far as i understand it. This opens a whole new perspective on the whole P2P theme, since a piece of information wouldn't be stored on one host alone, and also because of the implied redundancy, meaning that the information would still survive in that network until a critical number of hosts would be disconnected. Also netload would be distributed on the sending end, opening some interesting ways for congestion control.
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
And the process you describe does nothing for showing benefit. Look, we can represent the three original packets using three different packets is a rather pointless endeavor, especially sending all four seems really silly. If you scale that principle at all, pure XORing doesn't work to save any bandwidth. You need a better example to illustrate how this would actually be useful. I don't think you properly understand what the article is getting at...
XML is like violence. If it doesn't solve the problem, use more.
shouldn't be paying $150k for these types of technologies.
:-)
You haven't looked at the outrageous prices of Cisco equipment lately, have you?
Assuming these boxes are for accelerating multicast applications and preserving WAN bandwidth, then US$70,000 per box sounds competitive with Cisco. Put a large one of these boxes at the headend of a multicast, and one box for each LAN where there are a number of receiving clients, and there could be considerable savings in WAN usage.
/. readers don't realize how expensive a WAN link costs, and how those costs jump in large multiples when a PHB demands more bandwidth for his pet project. Especially outside of the US where international WAN links cost upwards of $10,000/month for 2Mbit/second, and to add another 2Mbit/sec will cost another $10,000 plus a long wait for turn-up. If the PHboss absolutely has to roll out a multicast application immediately, I'd throw a handful of these boxes on the network, and not bother with buying more bandwidth for a while.
the AC
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
For example, consider the transfer of a 1 GB file with an average Internet packet loss rate (2%) and global average RTT (just over 200 msec). For this case TCP has an inherent bandwidth limitation of 300 Kbps regardless of link speeds and thus will take more than 7 hours to transfer the file. A Digital Fountain server sending Meta-Content allows almost complete bandwidth utilization. With Digital Fountain, on a fully utilized 1.5 Mbps (T1) line the transfer of the 1 GB file will take about 1.5 hours and on a fully utilized 45 Mbps (T3) line the transfer time will be a little over 3 minutes.
I was a Skeptic, but I'm now that I've read the Digital Fountain white papers: Meta-Content Technology White Paper and TCP White Paper, I'm a True Believer.
I don't pretend to understand all the details, but the technology is based on the Luby transform which was invented by the company's founder. Essentially the content is broken into segments which are then randomly XORed to produce the metacontent. This provides redundancy since each segment of metacontent contains information from several of the original segments. Also the metacontent segments do not require sequential transmission.
Yes, but that doesn't work well in practice. One, by sending an extra packet, you're wasting bandwidth. Issuing an acknowledgement that the first three packets have been received would likely be too late to stop transmission of the fourth packet, otherwise, your complaint about negative acknowledgement being too slow becomes moot.
For another, a large portion of transmission errors are not localized to a single packet, thus providing for a single packet to be in error (parity scheme) is essentially useless, as you'd probably need to send a negative acknowledgement anyway.
XML is like violence. If it doesn't solve the problem, use more.
Morpheus allows for downloads from simultaneous sources.
In fact, I've seen my system upload files that I was still downloading!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
You would still want some form of error checking to make sure the you got all the right data. I would hate to download somehting only to have a corrupt file.
I think the big problem will be in the assembly and disassembly. Instead of the bottle neck being on the network, it will be on yoru box slowing everything else down.
I really don't trust this. I would hope the use integer math. Otherwise, you can end up with some problems due to the fact that you can't represent some numbers IEEE floating point notation (which a lot of computers use to store decimal numbers).
I don't see this taking off. There is too much possibility to mess up.
At the next eco-hypocrisy-meeting, count the private jets used to get to the meeting. Should be interesting to see that
When you remove all the marketing talk from that article and the company website, what remains is essentially a protocol which:
a) Uses UDP instead of TCP and implements its own proprietary flow control by sending the data multiple times
b) Has no streaming capabilities whatsoever.
Also, it's pretty evident that no matter how nifty their algorithm is, the data which needs to be transmitted before the file can be reconstructed needs to be at least as big as the size of the original file. Quite probably a lot more because of redundancy since the receiver can't acknowledge data.
I second the originals' posters feelings about the protocol...
They then find the power of 2 that is just larger than that. That's Y which is equal to 2 ^ N
They then take a ruler, 1 foot long, and record on it two pieces of information:
Insert ruler into padded envelope, and mail...
Or was this just from some old science fiction story???
If you're not part of the solution, you're part of the precipitate.
Or am I completely off my rocker here?
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
Lately in the binaries newsgroups, there has been increasing use of .PAR files. If a large number of .RAR or .ACE files are downloaded but some are incomplete or corrupt, just so long as you have an equal number of .PAR files as there are missing/corrupt files, the original data can be reconstructed, no matter which of the original files are missing. An explanation and a PAR file reconstruction program can be found at:
http://www.disc-chord.com/smartpar/
But where to get the mirror sites from? For example if a new Star Wars trailer hits the net?
An interesting idea here is to distribute the server workload onto the clients downwards:
Swarmcast is such a protocol. It uses forward error correction as well, and I believe some of the guys whose names were mentioned in this discussion, are involved in this as well.
Any expert who can comment on this one?
There are m unique packets, but any group of n of them will decode to the file where m is less than n. So if a packet is dropped, it's OK, as long as less than m-n are dropped. Essentially, there is redundant data, and the only response that needs to be sent is the "Ive got it" response.
__
Do ya feel happy-go-lucky, punk?
/.
Well, RECENTLY, Veronica was mouse-driven.
Of course, if you weren't a young pup still wet behind the ears, you'd remember that Archie in _Archie_Comics_ had that checker-board thingy on the sides of his head, and Veronica didn't. And I've heard that Archie was latently homosexual, and Veronica wasn't (that's why he was always chasing the unattainable Veronica instead of scoring with hot-to-trot Betty).
I'm sure Bhil will correct me if I'm wrong.
--Charlie
Bittorrent
does that! It's not really trying to be a napster or gnutella but a way to allow people to host a high volume website without having a lot of bandwidth. It works seamlessly with mozilla and IE. It's also quite fast unlike the anonymous p2p networks.
Client: Requesting 'kword'. .5Mb.
Server: Kword binary: 1.2Mb; Kword source
Sending source.
Client: Source received. Unpacking. Compiling.
Done.
Uh.. you mean ICMP I think. Widespread blocking of UDP isn't something I've noticed. That would suck considering 90% of the games people are playing online are UDP.
Sigs are awesome huh?
I'd guess they have to send about 2x as much data as the original for this to work. Maybe they're applying a compressor (like gzip) up front, and taking credit for that.
As others have pointed out, there are lots of other schemes in this class. Bear in mind that this is a protocol for broadcast and multicast, not general file transfer.
This could be true genius or mediocre junk depending on the details. That article wasn't clear: Does it have to be N specific packets (in any order) or can it be any N of the transmitted packets?
N specific packets is, well, nothing special. The idea of using negative acknowledgements instead of positive acknowledgements and retransmitting only the missing pieces is nothing new. It was used extensively in pre-Internet days on dedicated connections. Ever heard of Z-Modem? Its been avoided on the Internet because it makes the congestion control problem very difficult. Translating it into symbols is really just saying that they also compress the file first. Whoop de doo.
On the other hand, if it can be ANY set of N of the transmitted packets, well, that's downright incredible. The applications for such a technique are boundless... Everything from finally making multicast viable and desirable to latency and congestion indifferent file transfers to ultra-reliable offline storage.
How would you like a web server that can serve a T3's worth of clients off of a T1 and do it in such a way that the smart web browser can pre-cache data from the server in anticipation of the reader's next click, realtime adaptive based on the documents currently in multicast transmission to somebody else? Oh yeah, the web server can be on the Moon and respond to you as fast as if it was across the street. Genius.
So which is it? Any set of N or a specific set?
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
...as a suitable high-level system for those concerned about data corruption in file transfer? (The P2P system Electronic Donkey 2000 also incorporates a great deal of high-level error checking -- discarding and retransmitting file blocks whose checksums fail and even those which were corrupted on disk since the beginning of the transfer in the event of a restarted transfer).
No kidding. I read the entire article and didn't see just where their innovation lies... behind all the tech jargon and trademarked methods, it just sounds like fancy compression to me. I don't really understand the math behind this, but from what other people are saying, there doesn't seem to be much that hasn't been thought of before.
Something definitely smells fishy when the CEO is saying that FedEx is more reliable than FTP... I've downloaded more 20+MB Linux kernels than I can even count, and not one of the resulting files has ever had a CRC error. I have, however, known many people to have problems with packages they've shipped or recieved via FedEx.
I think the reason chip designers would FedEx their designs is more for security reasons (certified mail and insurance, etc) than reliability ones.
This article just reeked of boneheadedness, IMHO.
There's a difference between "stick with what we know works" and "Stick with the RFCs." RFCs are dynamic. The TCP and UDP ones have both gone through several revisions. The difference is if you stick with the RFCs you have a bunch of engineers from a bunch of different companies, schools, etc. look at your protocol and make sure it can be implemented from the spec. They also try to balance out its effect on internet traffic as a whole.
UDP is a perfectly legitimate method of moving data, especially if you can tolerate loss. It has been the general position of the IETF for years that you shouldn't reimplement retransmission--that's TCPs job. However, from descriptions it sounds more like this is a smart form of connection blasting, which isn't reinventing the wheel but can be hard on networks.
Speaking of that, it isn't the network that breaks because of UDP--it's your program. If you start sending large amounts of UDP data and chewing up the bandwith on my router, according to the spec I can start not even attempting to process the packets. This is the problem with blasting protocls--by taking up more bandwith than TCP for the same data transfer rate they invite administrative blockage.
What's the big deal? Direct connections did this on BBS's a long time ago. Simply, streaming all the data without flow control (Xmodem and early YModem required ACK/NAK), and let the client request the packets that it is missing leisurely. ZModem was the first to allow out of order delivery, from what I recall, and it revolutionized transfers with maximized bandwidth consumption.
The only thing this particular implementation allows is connecting at an arbitrary time and listening to a continuous loop of packets until you get them all. ZModem could easily be modified to do exactly this, except with checksum data per packet rather than XORing chunks of packets or using symbolic representations. I'm not read up on FEC, but if you're transmitting already compressed data (near the bit distribution of entropy), no alternate representation is advantageous than sending the raw data.
Any connection between your reality and mine is purely coincidental.
Yes. CDs encode data using two Reed-Solomon Codes. Each sample (44,100 per second) records two amplitude measurements (one for each channel), as a 16-bit words. Thus there are 4 bytes/sample. Measurements from 6 consecutive samples are grouped together, and encoded using the Reed Solomon code RS(2^8,5). This adds 4 bytes of redundancy to the 24 bytes of data. After that, the 28 resultant bytes of data are interleaved, then encoded in a a shortened Reed-Solomon code that adds another 4 bytes of redundancy. Then one more byte is added that is used for some type of control purpose (i don't know exactly). After that, a translation is used to make the physical reading/writing more feasable, three bits are added for some reason after the translation, then a 27 bit word is added for syncing purposes.
Overall 6 "ticks" of stereo music contain 196 bits of data, which is expanded to 588 bits written to the disc. The end result is that any error burst of 8871 or fewer bits of data on the disc can be corrected.
Actually, if you think about it, this kind of approach would work pretty well for an encrypted/distributed sharing network like Freenet, where individual nodes of the may or may not be available. You can take all of the pieces of the given file & spew them to lots of different parts of the network, then a client can just go around asking different nodes in the network for any pieces of that file until the client has collected enough of the pieces to form the whole.
why do people have to take a technology which is perfectly reasonable and useful within a certain problem domain, and turn it into snake oil?
I read that whole article, just praying I would see the name "Shannon" or the word "congestion" in there, or even TNSTAFL, but nooooo.
What these guys are doing is absolutely wonderful when
- the bandwidth-delay product is greater than the size of the data to be transferred
- upstream bandwidth is far more expensive than downstream bandwidth
- the receiver is stealthed
- the communications link is half-duplex.
- the receiver has cheap, fast*, temporary storage available. (faster than the network, anyway).
- occult knowledge is being transmitted, and the parties wish to have some deniability
What they are doing is absolutely horrible when the network links are shared with other people -- unless they have some other means of congestion control.
That's not unreasonable. Couldn't somebody just say as much?
This sounds like a problem for congestion control - UDP apps frequently don't have TCP-friendly congestion control, so they can get an unfair share of the network's bandwidth, affecting other applications that use TCP or TCP-friendly UDP protocols. FLID sounds like their form of congestion control, but if they are doing better than TCP it's quite likely it is more aggressive in its network impact. However, handling long burst errors sounds like a useful thing - of course, this can mask interesting errors in network configuration, e.g. lack of FEC on a wireless/satellite link, or lack of Frame Relay traffic shaping on a router with a FR link (this happened to a customer yesterday...).
Sending the same packet multiple time leads to a higher volume (you at least double the volume), for actually less fault tolerance.
Suppose you have a file made up of a hundredblocks, and that we have a 5% probability of packet loss.
In case you transmit each packet twice, you have for each packet a 0.05 * 0.05 probability = 0.25% that you loose both. This may seem small, but if you calculate it, you'll see that the probability of that not happening for any of the 100 packets is quite small: (1-0.0025)^100 = 77.8%. So, you have roughly one chance in 4 of not transmitting your file in its entirety. And to achieve this modest result, you needed to transmit 200 packets, rather than 100.
Now compare this to FEC: you transmit a number of additional packets which can be used to compensate for any loss, not just for its companion packet. Just adding 20 more packets brings the total probability of a successful complete transmission to 99.9999% (that's the probability that less than 20 packets are lost out of 120, given an individual loss probability of 5%). So you transmit less packets (120 rather than 200), and you get a higher assurance of overall transmission success.
Say no to software patents.