UDP + Math = Fast File Transfers

← Back to Stories (view on slashdot.org)

UDP + Math = Fast File Transfers

Posted by michael on Thursday December 13, 2001 @01:50AM from the udp-is-painless dept.

Wolfger writes: "A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..." Well, the information is transmitted, just not in its original form. I think there are a couple of other places working on similar ideas - wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.

11 of 449 comments (clear)

Min score:

Reason:

Sort:

Compression? by mi · 2001-12-13 02:00 · Score: 3, Interesting

In essence, is not this the same as file compression? The amount of information is the same
(for those, who remember, what Bit is). It is just, that the usual one character per byte is awfully wastefull. Which is why the various compressors are so effective.

Add a modern data transfer protocol and you may
get some start up money :-)

--
In Soviet Washington the swamp drains you.
Re:Flow Control by Omnifarious · 2001-12-13 02:20 · Score: 3, Interesting

Quite correct. This protocol does not sound at all TCP friendly. It needs some way of dynamically responding to network conditions to be that way. Even something so simple as doing an initial bandwidth test, then rate limiting the UDP packets to 90% of that would be a big help, though for a large file that would still create a lot of congestion problems.

Does anybody know if IPV6 has any kind of congestion notification ICMP messages so stacks can know to throttle applications when the applications are blasting out data that's congesting the network?

--
Need a Python, C++, Unix, Linux develop
Article is wrong by saridder · 2001-12-13 02:22 · Score: 4, Interesting

The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..."

This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed. The only time you get a 1 for 1 ack ratio is if you miss a packet and the window slams shut.

Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's. Frame Relay switches may drop packets if you exceed your CIR and begin bursting, so that whole transfer will never succeed. Therefore you actually waste bandwidth cause the whole transfer is doomed to fail, and the sender will never know it.

Also some routers have WRED configured in their queues, purposely dropping TCP packets to increase bandwidth on a global scale. This would damage the file transfer process as well if it was UDP based, as this system is.

Stick with the RFC's and the tried and true TCP transport system. This company will fail.

--
--- RFC 1149 Compliant.
1. Re:Article is wrong by Minupla · 2001-12-13 03:31 · Score: 5, Interesting
  
  Stick with the RFC's and the tried and true TCP transport system. This company will fail.
  
  You may be right, they may fall on their noses. Or the win with their system might be large enough that we decide that we need to rework the way we do some things. Hard to say at this point.
  
  I do take issue though with your 'Stick with the RFCs' comment. If we stick with what we know works, we will never evolve. If the folks at CERN had had this attitude, we'd still be using GOPHER (side note: my favorate interview question for testing people who claim to have a long time Net experience, 'Explain the difference between Archie and Veronicia'.)
  
  GOPHER was the tried and true text retrieval system. Besides, UDP has an RFC, and is a perfectly legit method of moving data, provided you accept its limitations and ensure you correct for them. TCP has a lot of overhead that is not always required. If our network breaks because someone sends UDP, its we who need to reread our RFCs.
  
  'Nuff Said.
  
  --
  On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
2. Re:Article is wrong by seanadams.com · 2001-12-13 05:35 · Score: 4, Interesting
  
  Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's.
  
  You're missing the point of UDP. UDP is just a *tiny* layer on top of IP, which adds a little extra information (basically the port number) so that the OS can deliver a packet to the right application. UDP can not be compared with TCP - it doesn't provide reliability and flow control, and it has absolutely no notion of a stream of data. If desired, these can be provided in the application layer (see UDP, TFTP, NFS, etc. etc.)
  
  TCP is a reliable transport, but it's much, much more than that. First off, the fact that you're using TCP doesn't make the path between sender/receiver any more reliable. Your packets get dropped just the same as if they were UDP or any other protocol over IP. TCP provides reliability by retransmitting lost packets, but you knew that. It also provides flow control and congestion avoidance - this means detecting when the receiving end (and the router queues in between) are ready for more data, and throttling your transmission rate to match that capacity. It also means being "fair" to other streams sharing the same bottleneck(s). It does this by "backing off" the number of packets in flight, i.e. halving the congestion window, to reduce the data rate. These algorithms are a very active field of research - there's a *lot* more to TCP than meets the eye of a socket programmer.
  
  When TCP loses a packet, that packet must be retransmitted. This is expensive because it means another RTT.
  
  Anyhoo...
  
  You can think of FEC for data transmission as being analogous to RAID 5 for storage. By adding one extra bit (the XOR of the rest of the word) you can lose any single bit and still know it's value. It's very simple. If the word is:
  
  0 1 0 1
  
  And I add an extra "parity" bit, where the parity bit is 1 is the number of ones in the rest of the word is odd, zero if it's even:
  
  0 1 0 1 [0]
  
  I can now lose any one bit (including of course the parity bit). Eg if I have only
  
  0 1 X 1 [0]
  
  Then I know the missing bit is a '0', because if it were '1' then the parity bit would be a zero.
  
  Applying this to data transmission, you can see that by sending just one extra packet, you greatly reduce the chance of having to retransmit anything.
  
  EG if I have to send you 10 packets over a link with 10% packet loss, there's a 65% chance that I'll have to retransmit one of those packets. (and a 10% chance that each retransmitted packet will have to be sent again, and so on).
  
  However if I'm using FEC and I send one extra "parity" packet, then I only have to retransmit if TWO OR MORE packets are lost. The chances of losing TWO out of the eleven packets is only 30%, so you can see that for an overhead of 10%, I've reduced the number of retransmits by a factor of more than two! I hope those figures are right. I used this tool to calculate them. Of course there are a lot of knobs you can turn depending on how much overhead you can afford for the parity information, and what degree of packet loss you want to be able to tolerate.
  
  Anyway, you can see that there are lots of possible improvements/alternatives to TCP - it's an old protocol and a lot of research has been done since RFC 793.
Re:Flow Control by Omnifarious · 2001-12-13 02:33 · Score: 5, Interesting

The 'equations' are broken up into pieces such that if you recieve any N pieces, you can reconstruct the entire file. It's like how some key sharing schemes work. Like Publius for example. Any N 'shares' of the key can be used to reconstruct the entire key. In this case the 'key' is the whole document, and I'm betting they use a different sharing scheme than ones already used for cryptography.

--
Need a Python, C++, Unix, Linux develop
Luby Transform Codes by Detritus · 2001-12-13 02:52 · Score: 5, Interesting

After looking through some of the material on the company's web site, this product appears to be based on LT (Luby Transform) coding. Each encoded packet is the result of XORing a random selected set of segments from the original file. When sufficient packets have been received, they can be used to reconstruct the original file. Insert magic algorithm. The nice thing about this is that the transmitter can continually stream packets, and a receiver can start collecting packets at any point in time. When the receiver has collected sufficient packets, it can reconstruct the original file. Packet ordering is totally irrelevant. You just need enough packets to generate a unique solution. The math for the code has not been published yet, but this is supposed to be a successor to tornado codes, which have been described in the literature.

--
Mea navis aericumbens anguillis abundat
Multicast by srichman · 2001-12-13 03:20 · Score: 4, Interesting

Quite correct. This protocol does not sound at all TCP friendly [yahoo.com]. It needs some way of dynamically responding to network conditions to be that way.
Well, "this protocol" is just multicast (from a network perspective). Though there have been research attempts to introduce pruning in multicast streams, it is inherent a non-flow controllable transmission protocol. If you take offense to the lack of "TCP friendliness" in multicast, then I suggest you complain to Steve Deering, not Digital Fountain.
Multicast is congestion friendly in that it can inherently seriously reduce bandwidth consumption, but it's obviously not congestion adaptive. I think the easiest (and probably best) way to introduce congestion control in a multicast is to have multiple streams at different rates, e.g., you have five simultaneous multicast webcasts of a Victoria Secret show, and folks can join the mulitcast group with the video quality that best suits the bandwidth and congestion on their pipes. This idea works very well for the Digital Fountain-style software distribution: rather than having a server serving the same stream at 10 different rates, you can have 10 servers streaming their own unsynchronized Tornado encodings, and clients can subscribe to however many their pipes can support. With 10 streams of data coming at you, you can reconstruct the original data ~10 times as fast.
Re:heh.. by regen · 2001-12-13 03:57 · Score: 4, Interesting

This is really funny. I used to work for the NYSE (Stock Exchange) and it wasn't uncommon to transmit 10GB of data over night via FTP to a single brokerage. The data that they were transmitting was at least as valuable as chip designs. It would allow you to alter trade position after trades had been transacted but before they cleared. (e.g. cancel a bad transaction or increase the amount of a good transaction)

--
The Economics of Website Security
Re:my take by mr3038 · 2001-12-13 05:30 · Score: 3, Interesting

It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. ...don't have to keep up this conversation: [long conversation removed]
It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss. How about simply compressing the data and using bigger packets to transmit it? If you can use big enough window while sending data you can push all the data to the network in the beginning. Conversation comes to A) Here it comes A) Done [64 packets, 125MB] B) Okay, listening B) Resend 2,7 and 25 A) Done [3 packets, 6MB] B) OK. Note that A starts sending before getting reply from B. In fact, with fast long-distance connection it could be that A gets to the end before B getting "Here it comes".
I think if we want to speed up file transfer we need an API to tell OS that we're going to send lots of data so make it big packets or the opposite. Currently we just open socket connection to destination and start write()ing. OS has no way to guess whether or not we're going to write 100 or 10e8 bytes. We need a way to tell OS that the data we're sending isn't worth a dime before it's all done so make it big packets to minimize bandwidth wasted to TCP control traffic.
You can opt to waste bandwidth to reduce perceived latency and that's what I think is done here. A sends file twice and in a case some packets were lost the sent copy would be used to fill in missing parts. A has sent missing packet before B had known it's missing it. Yeah, A wasted half the bandwidth for the redundant data that got correctly to the destination at the first time but we aren't interested in that. The key here is to use UDP so that lost packets are really lost instead of automatically resend. This kind of setup increases overall throughput only if latency is the only problem in your network. Perhaps this is needed in the future because increasing bandwidth is easy - not necessarily cheap - but the light crawling inside fiber is so damn slow!

--
_________________________
Spelling and grammar mistakes left as an exercise for the reader.
Re:heh.. by Thomas+Charron · 2001-12-13 08:44 · Score: 3, Interesting

Data transfered to variouse financial routines are routinely transmitted using insecure protocols in an unencrypted format. I've wored in at least 2 positions, one in a creditcard clearinghouse, another at an ECN. This data goes over unsecured channels all the time.

--
-- I'm the root of all that's evil, but you can call me cookie..