Slashdot Mirror


UDP + Math = Fast File Transfers

Wolfger writes: "A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..." Well, the information is transmitted, just not in its original form. I think there are a couple of other places working on similar ideas - wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.

154 of 449 comments (clear)

  1. Kazaa does that by DOsinga · · Score: 2, Informative

    User would go to download a game demo or something, receive pieces from several different places, and knit them together?

    The file sharing networks based on fasttrack technology do that. You download a movie or game from different users at the same time. Kazaa stitches it back together.

    1. Re:Kazaa does that by zachhendershot · · Score: 2

      A program by the name of Download Accelerator does that also. It splits up the file from the same location and downloads different chunks at the same time. Worked pretty well from my limited experience.

    2. Re:Kazaa does that by autocracy · · Score: 2

      Yes, but in many cases it's the sending end that bottle necks. Major FTP site have some serious bandwidth, but during the day it gets split pretty rough. Get it from many sending ends, and suddenly you're pulling all you can handle.

      --
      SIG: HUP
  2. getright by awptic · · Score: 4, Informative

    The name of the program michael is referring to is called getright, which can connect to several known mirrors of a file and download seperate fragments from each.

  3. Other Applications by Keeper+ofthe+Keys · · Score: 2, Informative


    GetRight uses multiple sites to download from and then pieces them back together.

    http://www.getright.com

  4. And cheap, too! by Tsar · · Score: 5, Funny

    The Transporter Fountain sits alongside a switch or router, and one Transporter Fountain is needed at the sending and receiving ends of a connection. Prices will range between $70,000 and $150,000.

    Oh, boy, I'm gonna stop by CompUSA on the way home and grab one of these.

    1. Re:And cheap, too! by "Zow" · · Score: 2

      No kidding - It's kind of like a FAX machine: That'll be about $70,000 if you just want to send or receive, or about $150,000 if you want to send and receive, give or take $10k.

      -"Zow"

  5. Vectors... by CoolVibe · · Score: 2
    Those would transfer the fastest, since they don't consist of bitmapped data, but just the instructions to create the image.

    I wonder what equations are used to convert raw unpredictable streams of data to formulas, and how come that the formulas used aren't bigger than the sent packets themselves? They mentioned XOR, but that just sounds silly, because XOR does nothing with data except do some reversible equation on them which does neither shrink or grow data.

    Does anyone have more info? It does sound interesting though...

    1. Re:Vectors... by CoolVibe · · Score: 3, Insightful
      > Theory that I learned in college indicates that there is a minimum number of bits that represent any information. Once compressed to that point, you can't go any further.

      Exactly. This is also the point where a equasion to represent the data is going to end up bigger than the data its trying to send. But it depends on the algorythm used too. If the data may be sent out of order, one could try block-sorting and then compressing (like bzip2 does), but since this is UDP, out of order packets will be dropped or either not delt with (I think).

      DISCLAIMER: I am not a protocol god, nor am I trying to be. Just spouting my views :-)

    2. Re:Vectors... by hburch · · Score: 5, Informative
      Consider the following (almost certainly bad, but workable) scheme:
      • Convert a chunk of the file into an order-k polynomial (use the coefficients of the polynomial to encode the chunk)
      • Send the evaluation of the polynomial at several distinct locations (more than k+1).
      • Receiver gets at least k+1 packets.
      • Using math, it recreates the original polynomial, and thus the chunk.


      Please note that I'm not saying this is a good scheme. It is just an example one, and one that doesn't detail the chunk polynomial conversion, which would be very important. There are several papers describing schemes where people have actually worked at making them tenable.

      Modulo compression, if you want such a system to require only receiving k packets (although you send more than that), the sum of the size of the k packets must be at least the size of the original file (otherwise, you could use such a scheme to compress the file).
    3. Re:Vectors... by nomadic · · Score: 2

      You mean you can't compress a megabyte file to one byte by pkzipping, the pkzipping the zip file, then pkzipping that file, etc.? Damn, there goes my Nobel, really thought I had a made a breakthrough with that algorithm.

    4. Re:Vectors... by evanbd · · Score: 2

      a far faster scheme would use flat surface in n-space. basically, instead of a curve on the xy plane, if you need to send 4 numbers, you use w = ax + by + cz + d, a-d being the numbers. send points on the surface (at least 4) and the surface can be reconstructed. Not fundamentally different, but you try solving a 27th degree polynomial equation, even with a computer...

      Oh, and your scheme actually only needs the original number of packets -- order k polynomial stores k+1 numbers (order 2 is ax^2 + bx +c, gives three numbers; you need three points to recreate. three equations, three unknowns)

    5. Re:Vectors... by gowen · · Score: 2
      Not fundamentally different, but you try solving a 27th degree polynomial equation, even with a computer...
      Nobodies suggestion the data is hidden in the roots (zeroes) but the coefficients. Whilst finding the roots of an order 27 polynomial is hard (and Galois theory says "forget it", at least if you want exact solutions), interpolating one through 28 data-points is easy...
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    6. Re:Vectors... by jtdubs · · Score: 2

      Let me get this straight, we use the bytes of data as coefficients in a huge polynomial. The number of terms in the polynomial equals the number of bytes in the message, and we call that k. We then sample the polynomial at k+1 various places and send the samples to the receiver. It reconstructs the polynomial and hence the data.

      If we have k coefficients to the polynomial, why would we send k+1 samples rather than just the k coefficients. This would alleivate the need of the receiver to reconstruct the polynomial and would have one less piece of data to send.

      Hey, wait a minute, if the k coefficients are the actual data, and that's what we are sending, then we are just sending the data.

      Wasn't that easier AND smaller than the polynomial idea. :-)

      You should wander over to sci.crypt and sci.crypt.random-numbers. Maybe you can convince them that this method will let you compress random data also.

      To elaborate, here's the general problem you are running in to:

      With any sufficiently random data, the size of the polynomial tends to be equal to the size of the data and hence doesn't help you. In fact, with any pattern you can find in the data, it will tend to take as much or more space to encode what pattern you found than just to encode the original data. That's the nature of the beast.

      Justin Dubs

    7. Re:Vectors... by gowen · · Score: 2
      if the k coefficients are the actual data, and that's what we are sending, then we are just sending the data.
      What you say is true, but you've missed the point. If you send the k polynomial coeffs and one gets lost, I have to ask you explicitly for the dropped one. If you send k+1 sampled data points and any one gets dropped, I can still reconstruct the data.
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  6. Compression? by mi · · Score: 3, Interesting

    In essence, is not this the same as file compression? The amount of information is the same
    (for those, who remember, what Bit is). It is just, that the usual one character per byte is awfully wastefull. Which is why the various compressors are so effective.

    Add a modern data transfer protocol and you may
    get some start up money :-)

    --
    In Soviet Washington the swamp drains you.
    1. Re:Compression? by s20451 · · Score: 4, Informative

      In essence, is not this the same as file compression? The amount of information is the same (for those, who remember, what Bit is).

      It is more than merely compression. The received data is compressed, which saves transmission time, but this technology is already well known (and the company isn't claiming a compression rate better than entropy, or anything else silly). The innovation here is the elimination of acknowledgement or ARQ packets. I'm speculating here, but it looks like they are encoding the data by transforming a file into a huge "codeword" -- when the codeword is transmitted, the receiver waits for enough packets to correctly decode the codeword, which results in the recovery of the file. There's no need for ARQ or TCP because transmitting extra codeword elements will automatically correct any errors incurred in transmission.

      --
      Toronto-area transit rider? Rate your ride.
  7. Flow Control by Detritus · · Score: 3, Informative

    You still need some form of flow control or rate limiting, otherwise a large percentage of the UDP packets are going to get dropped. Plus, you have the problem of UDP streams stealing bandwidth from TCP streams on a limited bandwidth link.

    --
    Mea navis aericumbens anguillis abundat
    1. Re:Flow Control by Omnifarious · · Score: 3, Interesting

      Quite correct. This protocol does not sound at all TCP friendly. It needs some way of dynamically responding to network conditions to be that way. Even something so simple as doing an initial bandwidth test, then rate limiting the UDP packets to 90% of that would be a big help, though for a large file that would still create a lot of congestion problems.

      Does anybody know if IPV6 has any kind of congestion notification ICMP messages so stacks can know to throttle applications when the applications are blasting out data that's congesting the network?

    2. Re:Flow Control by M100 · · Score: 2

      It doesn't matter if you lose data in the stream. You actually send more data than the original file - but the receiver has to receive approx the same amount of data as the original file (about 5% if I remember correctly). Thus you can send 100kbytes (for a 50k file) but the receiver only needs to get half the packets - so loss is not a problem.
      Think about simultaneous equations. If I have 2 unknowns I can solve it if I have two equations. Now imagine that I send you 4 equations, each with the two variables. You only need to receive 2 of the equations to be able to reconstruct the original two pieces of data.
      The other neat thing about this is that you can multicast traffic - and each receiver can start listenin when it wants - so if a receiver starts listening halfway through you can still get the whole file!

    3. Re:Flow Control by Omnifarious · · Score: 5, Interesting

      The 'equations' are broken up into pieces such that if you recieve any N pieces, you can reconstruct the entire file. It's like how some key sharing schemes work. Like Publius for example. Any N 'shares' of the key can be used to reconstruct the entire key. In this case the 'key' is the whole document, and I'm betting they use a different sharing scheme than ones already used for cryptography.

    4. Re:Flow Control by kmacleod · · Score: 2, Interesting

      I think you missed the point. Sure, you need only to sip from the fire hose to finally quench your thirst, but that wastes a lot of water.

      Where TCP acks and controls flow on every single packet, which also must be received by the sender in order, a protocol like this can occasionally send back a report on what kind and how many packets it is receiving (fragmented, packet loss due to bandwith, packet loss due to congestion).

    5. Re:Flow Control by bpowell423 · · Score: 2

      It looks to me like they don't care if packets get dropped. The sender just keeps jamming stuff down the pipe until the receiver gets enough. Yeah, seems to me they need some kind of flow control, and I definitely think it'd be a bandwidth hog.

  8. Kinda like IFS? by pointym5 · · Score: 2, Interesting

    I mean it's not for image compression specifically, but it definitely reminds me of IFS image compression in some ways. I'll bet that compression is very time consuming, but that's fine if you're warehousing data. I wonder if the clients are pre-loaded with a body of parameterized functions, so that the server just sends information describing what functions to run and what the parameters are. I guess if it's all based on polynomials all it needs to send are vectors of constants.

    Neat idea. Patents: here and here.

  9. heh.. by Xzzy · · Score: 5, Insightful

    > These files routinely are mailed on tape rather
    > than transmitted electronically. "FedEx is a
    > hell of a lot more reliable than FTP when
    > you're running 20 Mbytes,"

    Having worked in the industry they mention, I'd hazard that they don't use ftp more because of the illusion of security than anything else. People in the EDA world (which is where I worked, and has a close relationship with chip manufacturers) are immensely paranoid about people getting ahold of their chip designs, because if someone steals that.. you not only lose your next chip, you enable someone else to make it for you.

    These people just don't trust firewalls and ftp yet, but they do trust putting a tape in an envelope and snail mailing it. At the very least it makes someone liable if the letter gets stolen, which you can't do with electronic transfers..

    At any rate, ftp is plenty reliable for transfering 20mb files.. I do it every time a new game demo comes out. :P Maybe they meant 20gb. Cuz I've seen chip designs + noise analysis + whatever take dozens of gigs.

    1. Re:heh.. by swb · · Score: 2

      These people just don't trust firewalls and ftp yet, but they do trust putting a tape in an envelope and snail mailing it.

      I've heard this said about the diamond business and the postal service. Diamond couriers, who are carrying just diamonds, can be tracked and robbed easily. Once a package enters the postal stream its nearly impossible to steal that specific package.

      I dunno if its really true or not, but it has a certain counterintuitive logic that makes it believable.

    2. Re:heh.. by regen · · Score: 4, Interesting

      This is really funny. I used to work for the NYSE (Stock Exchange) and it wasn't uncommon to transmit 10GB of data over night via FTP to a single brokerage. The data that they were transmitting was at least as valuable as chip designs. It would allow you to alter trade position after trades had been transacted but before they cleared. (e.g. cancel a bad transaction or increase the amount of a good transaction)

    3. Re:heh.. by hardburn · · Score: 2

      Please tell me that data was encrypted first.

      --
      Not a typewriter
    4. Re:heh.. by austad · · Score: 2

      Why don't they transfer the files over HTTPS?


      That's not secure either. SSL sessions can be intercepted. Just take a look at ettercap, search for it on freshmeat.net. After playing with this on your bridged DSL connection a little bit, you'll realize that SSL sucks.

      Strong encryption via PGP or some other strong method would be the way to go.

      --
      Need Free Juniper/NetScreen Support? JuniperForum
    5. Re:heh.. by MeerCat · · Score: 3, Informative

      TCP/IP adds a 16-bit checksum to the packets. This will generally detect an error burst of 15 bits or less, if the data is uniformly distributed then this will accept a corrupt segment with probability 1 / (2^16-1). [Snader p70]. This was designed to catch bugs in routers etc. (which may write the wrong data when forwarding packets) rather than catch all data corruption on the wire.

      Depending on how much noise goes undetected at your physical layers, you should expect a TCP session to pass thru incorrect data about 1 in 10^9 to 10^12 bytes passed (thats the metric I use) - and if this is unacceptable then your application layer should check and/or correct data itself, bearing in mind the end-to-end argument for system design.

      T

      --
      I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best
    6. Re:heh.. by Namarrgon · · Score: 2

      Even Zmodem used a 32 bit CRC. Hell, why not just zip the thing?

      --
      Why would anyone engrave "Elbereth"?
    7. Re:heh.. by Jobe_br · · Score: 2

      Here's the link in case you want to try this out:

      http://freshmeat.net/projects/ettercap/"

    8. Re:heh.. by Thomas+Charron · · Score: 3, Interesting

      Data transfered to variouse financial routines are routinely transmitted using insecure protocols in an unencrypted format. I've wored in at least 2 positions, one in a creditcard clearinghouse, another at an ECN. This data goes over unsecured channels all the time.

      --
      -- I'm the root of all that's evil, but you can call me cookie..
  10. DAP? by BIGJIMSLATE · · Score: 2

    "User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name."

    Uh...doesn't something like Download Accelerator Plus (yeah yeah, I know its a hive of spyware) already do that (downloads from multiple locations only to recombine the file later)?

  11. Michael, did you even read it? by Chirs · · Score: 5, Informative

    Guys, this is nothing like Kazaa. Kazaa will let you download from several sources simultaneously, but only because it just requests different parts of the file from each source. At that point there are still send/ack type protocols in use.

    This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples. By changing to a different domain, it is possible to send metadata from which the original information can be recreated.

    I have no idea exactly how they would do it, but it doesn't sound completely impossible.

    However, its nothing like Kazaa or GetRight.

    1. Re:Michael, did you even read it? by no_such_user · · Score: 3, Informative

      IANADP (i am not a dsp programmer...), but if I remember correctly, the process of the Fourier transform doesn't reduce data. The transform brings our data into the frequency domain, which, in terms of audio and video data, is where we do all of the tricks to eliminate what we've learned our "ears" can't hear or "eyes" can't see.

  12. Swarmcast by Webmonger · · Score: 4, Informative

    I believe Michael's talking about OpenCola's Swarmcast.

  13. Name of company and product by Omnifarious · · Score: 4, Informative

    The company's name is OpenCola and the name of the product was SwarmCast. The guy who did SwarmCast, Justin Chapewske, is now at a company he started named Onion Networks. OpenCola appears to have completely abandon its original Open Source approach to their software.

    Apparently, Justin has taken the GPL portions of Swarmcast and is improving them at Onion Networks.

  14. Debian by Some+guy+named+Chris · · Score: 2

    Debian does something similar with the Pseudo Image Kit.

    It gets all the parts of the install ISO cd image, from disparate sources, stitches them together, and then uses rsync to patch it to exactly make a duplicate of the original install image.

    Very nifty.

  15. Not a new concept by KarmaBlackballed · · Score: 2, Informative

    The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.

    This is called compression. Everybody is doing it and it has been done before.

    When you download a ZIP file, you are not downloading the content. You are downloading a mathematically transformed version of it. You then translate it back. Modems have been compressing and decrompressing on the fly since the late 1980s.

    Maybe they have a better compression scheme? (Fractal based?) That would be news. Everything else is a distraction.

    --

    --- -- - -
    Give me LIBERTY, or give me a check.
    1. Re:Not a new concept by Omnifarious · · Score: 4, Informative

      UDP drops packets. What they are saying is they can packetize things in such a way that as soon as you pick up any N packets, you get the file, no matter what. They are also implying that anything less than N packets leaves you gibberish. This is quite different from file compression. It may be related to fractal file compression, but I think it's probably more related to cryptographic key sharing schemes.

    2. Re:Not a new concept by tzanger · · Score: 2

      Maybe they have a better compression scheme? (Fractal based?) That would be news. Everything else is a distraction.

      I remember back in the good old BBS days of a program called OWS which did "fractal compression"... In reality, it deleted the file and stored the sector information of the deleted blocks in the "compressed file" -- if you moved the .ows file off the disk, you'd get "decompression" errors. :-)

    3. Re:Not a new concept by Omnifarious · · Score: 2

      It won't work if they constantly resend the same packet over and over. They need it set up so that as soon as you get any N pcakets, you can reconstruct the file. This means that each packet needs to be different from all the other packets.

      So if one side sends the other 6 packets before recieving the 'stop' request, and only 5 are needed to send the file, it doesn't matter which 5 of the 6 are recieved. No packets are sent from the reciever during the transfer, only at the beginning and end.

      And, yes, it is a bandwidth hog, but for reasons that are rather different from the ones you imagine. It has no provision for congestion control, which means that it will keep blasting away with UDP packets at a congested router (keeping it congested) rather than backing off until the congestion ha cleared.

    4. Re:Not a new concept by GrEp · · Score: 2

      A tech paper on their "tornado codes". Also, the link to their tech papers website.

      Didn't have much chance to look over the algorithms, but we really should have compression used more frequently in network transport.

      --

      bash-2.04$
      bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
  16. Re:OOPS, name of person slightly wrong by Omnifarious · · Score: 3, Informative

    Oops, make that Justin Chapweske. That's what I get for typing out an odd name from memory. :-)

  17. Basically by glowingspleen · · Score: 2

    For anyone that just wants the jist of the article:

    "The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. "

    So basically, it's not much more than UDP with a single reply telling the server to stop transmitting.

    Not bad, but you better have some good timeouts worked into this thing. UDP by definition is a non-replying "if it gets dropped who cares?" protocol. If the receiver's connection were to go down, wouldn't the server just get flooding all the in-between routers with packets for awhile? That's not good for traffic congestion.

  18. Just send numbered UDF Packats by seanmceligot · · Score: 2, Interesting

    This could be done easily without the proprietary algorithms. Just send update packets with a header in each on stating that it is packet number N and there are X total packets. Then, request missing packets when you get towards the end, and put them all together in order when you get them all.

    Somewhat unrelated --- Does anyone else miss Z-Modem. We need a zmodem like program for that works over telnet so we don't have to open a separate FTP session. In the BBS days, you just typed rz filename and it came to you.

  19. my take by bpowell423 · · Score: 5, Informative
    There have been lots of comments along the lines of, "this is just a novel compression/transmittion scheme". In a way, that looks to be true, but here's my take.

    Judging from this:

    The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.

    It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. Apparently there are multiple solutions to the equations that will arrive at the same answer, so when the receiving end has received enough symbols to make it works it says, "stop sending already!" Apparently they're getting their speed because A) things don't have to go in any order (that's how the 'net is supposed to work, right?) and B) Alice and Bob don't have to keep up this conversation: Alice: Hey, Bob, can you send me X? Bob: Okay, are you ready? A: Yes, Go ahead? B: Okay, here it comes. A: I'm waiting. B: Here's the first packet." A: What? That packet didn't make it over. B: Okay, here it is again. A: Okay, I got that packet. B: Good. A: Okay, I'm ready for the second packet. B: Okay, here's the second packet.

    Okay, I had too much fun with the Alice and Bob conversation there. Anyway, it looks like there scheme is compressing things in the form of their equations, and then just sending them in a burst until the receiver is happy.

    Sounds like it might work, but it'll generate a ton of network traffic, I'd bet!

    1. Re:my take by mr3038 · · Score: 3, Interesting
      It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. ...don't have to keep up this conversation: [long conversation removed]

      It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss. How about simply compressing the data and using bigger packets to transmit it? If you can use big enough window while sending data you can push all the data to the network in the beginning. Conversation comes to A) Here it comes A) Done [64 packets, 125MB] B) Okay, listening B) Resend 2,7 and 25 A) Done [3 packets, 6MB] B) OK. Note that A starts sending before getting reply from B. In fact, with fast long-distance connection it could be that A gets to the end before B getting "Here it comes".

      I think if we want to speed up file transfer we need an API to tell OS that we're going to send lots of data so make it big packets or the opposite. Currently we just open socket connection to destination and start write()ing. OS has no way to guess whether or not we're going to write 100 or 10e8 bytes. We need a way to tell OS that the data we're sending isn't worth a dime before it's all done so make it big packets to minimize bandwidth wasted to TCP control traffic.

      You can opt to waste bandwidth to reduce perceived latency and that's what I think is done here. A sends file twice and in a case some packets were lost the sent copy would be used to fill in missing parts. A has sent missing packet before B had known it's missing it. Yeah, A wasted half the bandwidth for the redundant data that got correctly to the destination at the first time but we aren't interested in that. The key here is to use UDP so that lost packets are really lost instead of automatically resend. This kind of setup increases overall throughput only if latency is the only problem in your network. Perhaps this is needed in the future because increasing bandwidth is easy - not necessarily cheap - but the light crawling inside fiber is so damn slow!

      --
      _________________________
      Spelling and grammar mistakes left as an exercise for the reader.
    2. Re:my take by Courageous · · Score: 2

      It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss.

      You are almost certainly correct. Similar technologies that I've looked at in the past expand individual packet sizes by about 20% in order to do forward error correction and thereby save on NACK/resends. This becomes particularly valuable when latencies are long, of course, which is what you guessed the application is for.

      What's not said here is how much of the perceived problem they described is really just a problem with TCP/IP windowing. When effective bandwidth drops to a few percent of the channel bandwidth, this is almost always because the TCP/IP window buffer has been overrun.

      This, of course, is particularly noticeable on high bandwidth, very high latency networks of the type you see on some satellites.

      C//

    3. Re:my take by mr3038 · · Score: 2
      write the entire file to the socket at once [...] the sendfile() call

      (man sendfile, man 7 tcp) Hmmm.... TCP_CORK socket option seems interesting too. Unfortunately there also reads: "Other Unixes often implement sendfile with different semantics and prototypes. It should not be used in portable programs." and "TCP_CORK is new in 2.2". Does POSIX have any support for directing TCP/IP-stack about the content of data to optimize packet size and stuff or is there some another way to do this in portable way?

      --
      _________________________
      Spelling and grammar mistakes left as an exercise for the reader.
  20. fastest by cr@ckwhore · · Score: 4, Funny

    Someday when we all have extraordinarily fast computers, we'll simply be able to send somebody an MD5 sum and the computers will be able to "crack" it back into the original file. At that point, commercial software wouldn't even have to come on a CD... just print the hash on a slip of paper and the user could type it in.

    word.

    --
    Skiers and Riders -- http://www.snowjournal.com
    1. Re:fastest by tzanger · · Score: 2

      Someday when we all have extraordinarily fast computers, we'll simply be able to send somebody an MD5 sum and the computers will be able to "crack" it back into the original file.

      Not familliar with hashes? You can't do it quite like that because an MD5 hash is the same for any number of datasets. Trying to un-do a hash is pretty idiotic at that level.

      However, think about this scenario: Send the MD5, byte count and CRC64. Now rattle through all the possible combinations of "byte count" of data that generate the same MD5 and CRC64 and values as were sent. It's still not foolproof but you've reduced the dataset considerably. Of course, now you compare your new file to the GPG sig sent to ensure you got it right.

      Perhaps as we approach quantum computing it will become more realistic to brute-force a many-hundred-megabyte file from a number of its hashes.

    2. Re:fastest by cduffy · · Score: 2

      Heh... that raises an interesting question.

      If the set of "runnable programs" is small enough, it would be interesting to be able to find all runnable programs of less than a given length having a given MD5 sum. (Of course, strings, images and such would complicate the situation drastically -- so let's just leave those out and use this for sending code segments only).

      Of course, using a hash created explicitly to be one-way doesn't make much sense in such a situation... but just as a thought experiment it's interesting.

    3. Re:fastest by Bronster · · Score: 2

      Not familliar with hashes? You can't do it quite like that because an MD5 hash is the same for any number of datasets. Trying to un-do a hash is pretty idiotic at that level.

      So - most of those cracked datasets are going to be an invalid file for the format (which we can deduce from the 3 letter filename extention - usually .tmp or something depending on your email client's handling of temporary filenames ;( ).

      Any decoded files can then be checked (it's supposed to be a JPEG, does it look anything like a picture? Not it then.

      Anything that passes all those filters can then be sold as art, there's gotta be some sucker out there willing to buy.

      Meanwhile, I have a super fast computer which can crack MD5s into lots of art forms, and I would proceed to break into the FBI before getting shot in the head (but not before getting a blowjob from the yummy blonde (there's always one)).

  21. Entry level is $70k by joshv · · Score: 2

    Yeah, I don't ftp is so slow that anyone is going to pay $70k for their proprietary 'Transporter Fountains'. Seems like anyone with a little common sense and math ability could easily cobble together a software UDP based transfer protocol that has all of the properties described in the article.

    The key is to build in redundancy without increasing the amount of data sent so much that you counteract the speed gains you get by using UDP.

    -josh

  22. FTP already provides for this by autocracy · · Score: 2, Informative

    Just send restart at commands to many different servers, then cat the files onto each other. This is how Dowload Accelerator does it, and Fast Track is the same theory. Programs just take all the mental work out of it.

    --
    SIG: HUP
    1. Re:FTP already provides for this by autocracy · · Score: 2

      This response was not to the article, just the others who posted info about "$PROGRAM does this!" See my other comment in this story.

      --
      SIG: HUP
  23. Article is wrong by saridder · · Score: 4, Interesting

    The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..."

    This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed. The only time you get a 1 for 1 ack ratio is if you miss a packet and the window slams shut.

    Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's. Frame Relay switches may drop packets if you exceed your CIR and begin bursting, so that whole transfer will never succeed. Therefore you actually waste bandwidth cause the whole transfer is doomed to fail, and the sender will never know it.

    Also some routers have WRED configured in their queues, purposely dropping TCP packets to increase bandwidth on a global scale. This would damage the file transfer process as well if it was UDP based, as this system is.

    Stick with the RFC's and the tried and true TCP transport system. This company will fail.

    --
    --- RFC 1149 Compliant.
    1. Re:Article is wrong by StormyWeather · · Score: 2, Insightful

      Sliding windows is flow control, not error recovery.

      I would trust UDP with anything I would trust TCP with as long as the application does the error checking on the data, which is exactly what they are saying their product does. TCP is really high overhead compared to UDP, and not always necessary. One of the reasons for TCP was so that programers wouldn't have to deal with as much, but if you can make something that handles it more efficiently then you only have to send a retransmit request whenever there is lost data, and not after every window.

      Maybe it's my tendacy to fight for the underdog but I feel UDP has gotten the shaft. It's a great way to slam traffic around, and as secure as your application is written to make it.

      Nice little doc over TCP and sliding windows for anyone that might want one.

    2. Re:Article is wrong by Minupla · · Score: 5, Interesting

      Stick with the RFC's and the tried and true TCP transport system. This company will fail.

      You may be right, they may fall on their noses. Or the win with their system might be large enough that we decide that we need to rework the way we do some things. Hard to say at this point.

      I do take issue though with your 'Stick with the RFCs' comment. If we stick with what we know works, we will never evolve. If the folks at CERN had had this attitude, we'd still be using GOPHER (side note: my favorate interview question for testing people who claim to have a long time Net experience, 'Explain the difference between Archie and Veronicia'.)

      GOPHER was the tried and true text retrieval system. Besides, UDP has an RFC, and is a perfectly legit method of moving data, provided you accept its limitations and ensure you correct for them. TCP has a lot of overhead that is not always required. If our network breaks because someone sends UDP, its we who need to reread our RFCs.

      'Nuff Said.

      --
      On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
    3. Re:Article is wrong by srichman · · Score: 2
      The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..." This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed.
      TCP still acknowledges every packet; it just batches acknowledgements as an optimization. An important point, though, is that if you starting dropping packets (or if they arrive out of order) at a moderate rate (could even be a small loss rate, depending on your delay and bandwidth), fast retransmit (introduced in TCP Reno) will cause you to acknowledge every packet.

      In contrast, UDP never acknowledges (or disacknowledges) a packet.

    4. Re:Article is wrong by seanadams.com · · Score: 4, Interesting

      Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's.

      You're missing the point of UDP. UDP is just a *tiny* layer on top of IP, which adds a little extra information (basically the port number) so that the OS can deliver a packet to the right application. UDP can not be compared with TCP - it doesn't provide reliability and flow control, and it has absolutely no notion of a stream of data. If desired, these can be provided in the application layer (see UDP, TFTP, NFS, etc. etc.)

      TCP is a reliable transport, but it's much, much more than that. First off, the fact that you're using TCP doesn't make the path between sender/receiver any more reliable. Your packets get dropped just the same as if they were UDP or any other protocol over IP. TCP provides reliability by retransmitting lost packets, but you knew that. It also provides flow control and congestion avoidance - this means detecting when the receiving end (and the router queues in between) are ready for more data, and throttling your transmission rate to match that capacity. It also means being "fair" to other streams sharing the same bottleneck(s). It does this by "backing off" the number of packets in flight, i.e. halving the congestion window, to reduce the data rate. These algorithms are a very active field of research - there's a *lot* more to TCP than meets the eye of a socket programmer.

      When TCP loses a packet, that packet must be retransmitted. This is expensive because it means another RTT.

      Anyhoo...

      You can think of FEC for data transmission as being analogous to RAID 5 for storage. By adding one extra bit (the XOR of the rest of the word) you can lose any single bit and still know it's value. It's very simple. If the word is:

      0 1 0 1

      And I add an extra "parity" bit, where the parity bit is 1 is the number of ones in the rest of the word is odd, zero if it's even:

      0 1 0 1 [0]

      I can now lose any one bit (including of course the parity bit). Eg if I have only

      0 1 X 1 [0]

      Then I know the missing bit is a '0', because if it were '1' then the parity bit would be a zero.

      Applying this to data transmission, you can see that by sending just one extra packet, you greatly reduce the chance of having to retransmit anything.

      EG if I have to send you 10 packets over a link with 10% packet loss, there's a 65% chance that I'll have to retransmit one of those packets. (and a 10% chance that each retransmitted packet will have to be sent again, and so on).

      However if I'm using FEC and I send one extra "parity" packet, then I only have to retransmit if TWO OR MORE packets are lost. The chances of losing TWO out of the eleven packets is only 30%, so you can see that for an overhead of 10%, I've reduced the number of retransmits by a factor of more than two! I hope those figures are right. I used this tool to calculate them. Of course there are a lot of knobs you can turn depending on how much overhead you can afford for the parity information, and what degree of packet loss you want to be able to tolerate.

      Anyway, you can see that there are lots of possible improvements/alternatives to TCP - it's an old protocol and a lot of research has been done since RFC 793.

    5. Re:Article is wrong by seanadams.com · · Score: 2

      (see UDP, TFTP, NFS, etc. etc.)

      Er... I meant (see DNS, TFTP, NFS, etc. etc.)

    6. Re:Article is wrong by evilviper · · Score: 2
      We send recipes, not pieces of content," said Clifford Meltzer, chief executive
      of Digital Fountain. "Once you get enough [of the packets] coming in, Spock
      appears. If you get 98 percent of the packets, you get nothing.


      I think you are misreading this quote... They are just saying that you can't download part of a file and have the file work partially... I.E. downloading a movie on Gnutella often results in half-movies when the server disconnects, but you are still able to watch the first half of that video. This doesn't imply anything about a limitation in UDP.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    7. Re:Article is wrong by seanadams.com · · Score: 2

      But what if the parity packet is lost?

      Parity bits are also used in serial data transmission to confirm data inegrity - if the parity bit is wrong, then you can be sure that there's something wrong with the link layer.

      In this case, the parity packet is being used to recover *lost* data, not to confirm data integrity. You would still have a per-packet checksum to ensure that each packet is valid.

      Bit errors are extremely rare on the Internet. Normally when there is a significant chance of data corruption, say on a wireless or analog modem line, you would have strong error checking in the link layer.

      Lost packets, on the other hand, are very common . In fact they're an intended consequence of TCP - you send data as fast as possible until packets are dropped, and then you back off. If you're getting lots of loss, it doesn't mean there's something "broken" with the network, it just means that the link is over-saturated.

    8. Re:Article is wrong by seanadams.com · · Score: 2

      Bit errors aren't that uncommon. Do a sh int serial[#] on a Cisco router and you'll see a number of errors on that line.

      Right - it's detecting the error at the link layer (using HDLC or PPP checksums, I guess), and dropping the packet at that point. The Cisco doesn't forward the packet if it knows it's corrupted.

      So a partiy packet recovers lost packets? But are there seq. numbers for the parity packet? How do it know that this parity packet is for what pevious packets?

      I don't know the details of any protocols that do this, but yes, there would have to be something like sequence numbers to associate packets with the correct parity information. I haven't read the article yet (shame on me) but it sounds like this protocol is much more sophisticated than the one I've described.

  24. PostScript for data by swb · · Score: 2

    This sounds a lot like what PostScript is to a rasterized file. A set of descriptions of what the picture looks like, which are small and easy to transmit, which are then drawn to produce the picture.

    With real vector PS its easy, since you start out by creating vectors (eg, Adobe Illustrator). How you get from a non-vector "destination" to the metadata you really want to transmit sounds complicated.

  25. Udpcast by BlueUnderwear · · Score: 2

    Udpcast in FEC mode does this too: in addition to the original data, it can transmit an arbitratry amount of "FEC" blocks which are a linear combination of the data blocks. If some data blocks are lost in transit, udpcast can recalculate them from the FEC blocks by multiplying the vector of received data by the inverse encoding matrix.

    --
    Say no to software patents.
  26. XOR = advanced algorithm by null+etc. · · Score: 2, Informative
    Quoted from the article:
    In this case, the Transporter Fountain creates not equations but hundreds of millions of "symbols" which can be used to reconstruct the data. The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.
    So, assuming that each "symbol" is at least one byte, then creating "hundreds of millions" of these symbols would result in hundreds of megabytes of data. Furthermore, the guy quoted 20MB as being a large amount of data to send.

    Conclusion: Only sales & marketing would try to sell a product that turns 20MB into 100MB, sends it via UDP, only in order to have the results XOR'd together.

    Where do they get these people?

    1. Re:XOR = advanced algorithm by BlueUnderwear · · Score: 2
      Absolutely. The only usefulness for this would be in a situation where two-way communication is simply not possible. There are improvements which can be made to TCP for file transfers, but dropping all feedback mechanisms is simply stupid.

      Dropping all feedback is interesting in situations where:

      • there is a very high latency, and feedback would arrive so late that it would seriously slow things down (i.e. satellite links)
      • the return channel is costly (again, satellite links, with a terrestial (dial-in) return channel). If you can do away with the return channel, you win the cost of dialing in, which is interesting if you are on a metered telco, as is common in Europe.
      As far as only a certain number of packets being required to be received on the other end, that is either completely false, or the overhead is astronomical. More likely it is just plain wrong, or this product is complete vaporware to begin with.

      This is feasible in a way to make the number of extra packets tuneable to the expected loss.

      The algorithm is indeed based XOR, but that's not the only component though. The idea is to define a field on the set of all byte (or short, or word...) values. You use XOR as the addition, and Galois multiplication as the multiplication.

      Then you treat your data blocks as vectors, of which you can do linear combinations in the Galois field.

      If you have n data blocks to transmit, and want to add k redundant blocks, you first arrange your n data blocks in an n-element vector. Then you multiply that vector with an n times n+k Vandermonde matrix to optain a new vector of n+k elements. Those n+k elements are the blocks which will be effectively transmitted

      A Vandermonde matrix is a matrix having the following form:

      1 x_1^1 x_1^2 x_1^3 ...
      1 x_2^1 x_2^2 x_2^3 ...
      1 x_3^1 x_3^2 x_3^3 ...
      1 x_4^1 x_4^2 x_4^3 ...
      .......
      A square Vandermonde matrix has the interesting property of being inversible. Moreover, a subset of rows of a Vandermonde matrix is still a Vandermonde matrix. Loosing packets is equivalent to dropping rows (as each one of the n+k transmitted packets is obtained by multiplying one row of the matrix by the vector of the original n packets).

      The receiver just calculates the remaining Vandermonde matrix (by striking out the rows that correspond to the dropped packets, plus some more if less than k were dropped), and inverts the remaining matrix. By multiplying this inverse with the vector of received blocks, the receiver can obtain the original vector of n data packets.

      The nice thing about this is that k is freely tuneable (as long as the field is big enough: but if you define your field over 4 byte values, that should not be a problem). So, you just take a value for k that matches the expected loss rate plus some comfortable safety margin, and you're set. Considering that, turning 20MB into 100MB will only be necessary if you expect to loose 4 packets out of 5...

      Of course, all this doesn't resolve the issue of flow control, so the sender needs to be tuned manually such as not to emit faster than the physical link can handle.

      --
      Say no to software patents.
    2. Re:XOR = advanced algorithm by BlueUnderwear · · Score: 2
      If you expect to lose 4 packets out of 5, you need to turn 20MB into a lot more than 100MB.

      You are right, you need slightly more to cover "standard deviation". If you toss a coin 100 times, it's very rare that you get exactly 50 heads. You may get 40, you may get 60. That's due to standard deviation, but the effect wears off the bigger your sample. So more than 100MB will be necessary, but not substantially more.

      If you transmit 100 packets, and expect to receive 20 of them, the chances that you transmitted the right 20 is very very slim. And you absolutely cannot produce a scheme where any 20 will give you the right answer.

      Please reread my post. Any 20 are ok, that's the whole point.

      This information must be transferred reliably and in order

      Order is easy to take care off. Just add a sequence number to your packets to tell them apart (yes, that's another overhead, but it's small).

      You simply cannot have two different sets of 20MB be equivalent, because if you do, you lose information (pidgeon-hole principle). So given 19 packets that are received, there is exactly one 1 MB packet which can be received.

      Well, if you re-read my posting, you'll see that yes, any 20 packets would be enough to reconstruct the information (as long as they are different, of course...duplicating a same packet 20 times won't work obviously).

      It's been a while since I've calculated hamming distances, so I'm not going to get into the exact number of packets that need to be sent, but I hope that my discussion above showed that the number is greater than simply multiplying by the inverse of the expected reliability.

      The number is indeed greater (due to standard deviation), but not substantially so.

      If you're interested in the specifics, please read Luigi Rizzo's paper.

      You might also want to check out udpcast which provides a working implementation of a FEC algorithm, that works in practice (albeit only on "slice" sizes of up to 128 packets, but that's purely for practical reasons: in theory no such limitation would be necessary).

      --
      Say no to software patents.
    3. Re:XOR = advanced algorithm by Happy+Monkey · · Score: 2

      There's a program called "Mirror" that's out on usenet these days that will reconstruct missing parts of a posted file using the parts you do have and a small number of extra files. I'm not sure how many of each are required, but the number of extra files is very small compared to the number of actual files, and it can reconstruct quite a few missing files.I can't get more specific, but it's pretty cool.

      --
      __
      Do ya feel happy-go-lucky, punk?
  27. Re:Link to product by Omnifarious · · Score: 3, Informative

    Oh, yes, here's a link to the SourceForge project for SwarmCast.

  28. Tornado Codes by Jonas+�berg · · Score: 5, Informative
    While not actually related, John Byers, Michael Luby and Michael Mitzenmacher wrote a paper on using Tornado codes to speed up downloads. Basically, what they propose is clients accessing a file from more than one mirror in parallel and using erasure codes to make the system feedback-free. The abstract:

    Mirror sites enable client requests to be serviced by any of a number of servers, reducing load at individual servers and dispersing network load. Typicall, a client requests service from a single mirror site. We consider enabling a client to access a file from multiple mirror sites in parallel to speed up the download. To eliminate complex client-server negotiations that a straightforward implementation of this approach would require, we develop a feedback-free protocol based on erasure codes. We demonstrate that a protocol using fast Tornado codes can deliver dramatic speedups at the expense of transmitting a moderate number of additional packets into the network. Our scalable solution extends naturally to allow multiple clients to access data from multiple mirror sites imultaneously. Our approach applies naturally to wireless networks and satellite networks as well.

    I don't have the paper in a computer format, but the number is TR-98-021 and John and Michael were both at Berkeley at the time (1998), so it should be fairly easy to find if someone is interested. Doubtlessly, a number of other reports on the subject should also exist that deals with the same problem but with different solutions.
    1. Re:Tornado Codes by NickPest · · Score: 2, Informative
      You can find the paper here.

      The folks at Digital Fountain are indeed using a highly-tuned (and proprietary) version of tornado codes. I also recommend the following papers if you're interested in what I think has the potential to be the greatest thing since TCP: tornado codes + end-system multicast

      <shameless plug>
      I'm currently working on a research project with John and others that integrates tornado codes and end-system multicast into a Freenet-like system. Best of all, it's GPL'd!
      </shameless plug>

  29. What's going on here? by gotan · · Score: 2

    Sorry, the whole article seems to make some magic mumbo-jumbo out of the process. Apparently the file is transformed, but how does that transformation help? The main difference between UDP and TCP in this case is, that TCP maintains the sequence of Packets, so after splitting a file up, sending it as TCP-Packets and combining it again, all parts (sent as Packets) are in the right place. UDP does no such thing, and also UDP doesn't check, if a packet really reached it's destination. This frees UDP of some overhead TCP has. But to send a large File (with a simple approach), now you have to label each UDP-Packet with a sequential number, and, at the end, check if all Packets arrived (and maybe request missing Packets again), then rearrange them according to the sequence numbers.

    Now i don't see, how a transformation of content helps here, instead of adding the information where in the file the packet goes (a kind of serial number), now you have to label, where in the equation it should go (a kind of coefficient index), so the receiving end knows, whether it has all information, and which information is still missing, and must be requested again.

    --
    "By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
    1. Re:What's going on here? by bpowell423 · · Score: 3, Informative

      That's not it. Because of their algorithm, order doesn't matter, and neither do dropped packets. The receiver only needs any n packets, so the transmitter keeps sending until the receiver says it got enough. Then the results are magically XORed to get the original file. So, no, you don't need a sequential number, you don't need to check if all packets arrived, and you don't have to rearrange them.

  30. More commentary by autocracy · · Score: 2

    It's an encoding scheme that sends you the instructions on how to build something rather than the stuff itself. Not so special as they make it sound. Saying that you get the data without it being sent to you is the biggest troll for mid-level clueless managers that want to download their "repr0ns" faster. Not that I'm even sure it will work that well....

    --
    SIG: HUP
  31. Think Reliable Multicast + XOR Recovery by hughk · · Score: 2
    This is basically what the guys doing reliable multicast get up to plus what you do for tape backups. It isn't particularly new.

    You create your data in records and groups. Each group contains a longitudenal XOR of the other records within the block. This comes from tape backups that were proof against bad-spots and was later used in RAID.

    You then sequence and shoot your data in records across the network. If one record is dropped, it can be recreated through the XOR redundancy record. If two records are dropped, you need a rerequest mechanism. This can be either on UDP or via a separate TCP link.

    If you want an example of prior art, go to the trading room of a bank. How do you think all those prices are delivered to every workstation?

    --
    See my journal, I write things there
  32. michael didn't read the article carefully, I guess by rlowe69 · · Score: 2
    ...wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together?

    michael, this is not what the product does. From the article:
    By translating a packet stream into mathematical elements, the company eliminates the back-and-forth transactions that confirm whether data has reached its destination. In the Digital Fountain approach, the receiving end waits until it has received a certain number of packets, then signals the transmitting side to stop sending. The operation doesn't require a network processor, but relies instead on the computational power of standard PC processors.

    The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.

    It appears as though the singal is broken down into equations, that when combined produce the original data. These equations are all sent from the same server to the destination client. The speed increase then comes from the fact that the size of the equations is less than the size of the data.

    The article does not mention that the equations come from multiple servers, which is a very big difference! IMO, this technology is much more newsworthy than yet another multi-server downloading tool like Kazaa.
    --
    ----- rL
  33. How it works by gargle · · Score: 5, Insightful

    The EETimes article is extremely poorly written.

    The technique used by Digital Fountain is called Forward Error Correction. It allows a message M with m parts to be encoded into n parts, where n > m. The interesting thing about this is that any m of the n parts will allow the original message M to be reconstructed.

    This means that if a part of the message is missed, the receiver doesn't have to request a resend .. it just continues listening. This is especially cool for multicast transmission since even if receivers A and B miss different parts of the message, the broadcaster doesn't have to send the missed parts of the message to the different receivers - it just continues broadcasting since any part can substitute for any other part.

    1. Re:How it works by Raphael · · Score: 2
      The technique used by Digital Fountain is called Forward Error Correction.

      Several Forward Error Correction techniques are used in cellular networks, for example. If you have a GPRS or (soon) UMTS phone, it is already using some of the techniques that appear to be incredibly advanced if you read the EETimes article. But in fact, all these things are well known. I think that several DAB/DVB (Digital Audio/Video Broadcasting) protocols do the same as well.

      --
      -Raphaël
    2. Re:How it works by Hieronymus+Howard · · Score: 2

      DVB does indeed use FEC. I think that the standard FEC ratio used by Sky in the UK is 2/3 (i.e. where n is 50% greater than m).

      HH

  34. read then speak, not distributed, not compression by buridan · · Score: 2, Informative

    ok, people, you seem to be far away from your thinking apparatus.
    compression takes data, applies an algorythm to it to generate new data that is representation of the whole.

    contrary to that, this transforms the data to an algorythm that simulates the whole.

    it is a minor point of order, but one worth thinking about because it is theoretically different.

    this is also not distributed, it is point to point, I don't know where the submitter got the distributed model from, I suspect he pulled it out the air, but this is very clearly not that. It requires a machine at each end of the point2point.

    however, logically, all data that is not random, can be represented by equations that are smaller than the data. However, the real load exists in the generation of the equations from the data, and not the reconstruction of the data itself, so for me this seems to be quite a possible project, though i suppose it will take quite a bit optimization.

    one other thing, the article does not say that it uses a standardized technique, and it would be interesting if they did not use standard vector analysis or the like. If they used vectors, then this could just reduce to a system of file conversion from one standard to another. I think it would be far more interesting to be what it says it is supposed to be a file conversion to mathematical simulation of the file.

  35. FTP Unreliable? by Tim+C · · Score: 2
    From the article:
    "FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes," said Charlie Oppenheimer, vice president of marketing at Digital Fountain.

    Maybe I'm misunderstanding the quote, but I semi-regularly download 600+ MB iso images and other multiple-hundred-meg files over ftp links at work, and I've never had problems. It's probably just me, but that really sounds like a load of bull, designed to whip up interest in an otherwise lack-lustre product (given it's high price).

    Cheers,

    Tim
  36. More details of their method by mfg · · Score: 2, Informative

    See http://www.digitalfountain.com/getDocument.htm/tec hnology/DF_MetaContentWhitePaper_v4.pdf

  37. The RFC by gargle · · Score: 5, Informative

    http://www.ietf.org/internet-drafts/draft-ietf-rmt -info-fec-01.txt

    The use of Forward Error Correction in Reliable Multicast

    Enjoy.

  38. Forward Error Correction, MojoNation by Adam+J.+Richter · · Score: 2, Informative

    The technique is called Forward Error Correction. I don't know much about it, but I know that you can do things like break up a file of N*B bytes into 2N blocks of B bytes each and then be able to reconstruct the file from any N of the 2N blocks. The GPL'ed Mojonation system uses it, if I recall correctly, to distribute each 32kB chunk of data into eight 8kB blocks allowing reonstruction from any four of them.

  39. Might Be more Related than you think by Anonymous Coward · · Score: 3, Informative

    While not actually related, John Byers, Michael Luby and Michael Mitzenmacher wrote a paper on using Tornado codes to speed up downloads. Basically, what they propose is clients accessing a file from more than one mirror in parallel and using erasure codes to make the system feedback-free.

    Those "Tornado Codes" you mentioned are coauthored by one of the executives of this company (Luby is CTO).
  40. Luby Transform Codes by Detritus · · Score: 5, Interesting

    After looking through some of the material on the company's web site, this product appears to be based on LT (Luby Transform) coding. Each encoded packet is the result of XORing a random selected set of segments from the original file. When sufficient packets have been received, they can be used to reconstruct the original file. Insert magic algorithm. The nice thing about this is that the transmitter can continually stream packets, and a receiver can start collecting packets at any point in time. When the receiver has collected sufficient packets, it can reconstruct the original file. Packet ordering is totally irrelevant. You just need enough packets to generate a unique solution. The math for the code has not been published yet, but this is supposed to be a successor to tornado codes, which have been described in the literature.

    --
    Mea navis aericumbens anguillis abundat
  41. Re:A good idea, but old by cloudmaster · · Score: 3, Informative

    Since it doesn't use TCP, I'll bet it won't have any problem handling the TCP part of "TCP/IP connections"... The networking end sounds really simple - send the amount of data to expect, wait for confirmation, send all of the data once *without* waiting for confirmation until it's all been sent. That's a lot easier than handling the overhead of tcp, which is the whole problem they're trying to solve.

    BTW, it helps to read the article before posting "insightful" comments. :) There's a nice little demo at http://www.digitalfountain.com/technology/coreTech nology.htm

  42. security by night37 · · Score: 2, Offtopic

    I wonder how hard it would be to highjack a UDP based session like this. What if bogus packets are injected along with the stream of valid ones. Does the math include any form on encyption? Or is this a tunnel for other protocols? Damn it, we need to move away from clear text protocols, not create new ones!

  43. A bit insulting by srichman · · Score: 2
    Maybe, but comparing what GetRight et al. do (parallel downloading from FTP mirrors) to how Digital Fountain would achieve the same thing (erasure codes) is like comparing cups and string to IP telephony. Sure, they achieve the same thing, but the comparison is a bit insulting...

    With the GetRight solution, 100 people downloading from my FTP mirror = 100 TCP streams worth of bandwidth and CPU consumption. With the Digital Fountain solution, 100 people downloading from my mirror = 1 stream worth of bandwidth and CPU consumption.

  44. Lots of programs do that by StrawberryFrog · · Score: 2
    A win32 FTP client called GetRight as been capable of doing this since several years ago.

    They called it "Segmented Downloads", ie the program would hit multiple ftp sites for the same file, doing a resume to get different parts at the same time. Heck, it would even locate the mirrors for you.

    And yes, it caused a substantial improvement in download speed. It seems thus that the bottleneck is seldom in the last mile.

    But this has little to do with the article, which as far as I can tell is mostly gibberish.

    --

    My Karma: ran over your Dogma
    StrawberryFrog

    1. Re:Lots of programs do that by InsaneGeek · · Score: 2

      The splitting is really meant for multiple ftp servers presenting the same file, not for going to the same one webserver multiple times... like this:

      Your box A sits on a T3

      Ftp server B sits on a T1
      Ftp server C sits on a different T1
      Ftp server D sits on another different T1

      The next three processes occur at the same time:

      box A ftp's to server A starts request (getting first 1/3)
      box B ftp's to server B issues a reget a third of the way into the file (getting middle 1/3)
      box C ftp's to server C issues a reget two-thirds of the way into the file (getting last 1/3)

      Maximum bandwidth from any one site is a single T1, you are then able to download the file at 3xT1 (~4.5mb). More often than not, with broadband it's not the end user who's got a bottleneck, but the end site who's got a T1, but has 1000 other people requesting files at the same time.

    2. Re:Lots of programs do that by InsaneGeek · · Score: 2

      Getright does make multiple connections to multiple servers... checkout http://www.getright.com/fastest.html under the "Segmented (Accelerated) Download" header, it talks about this exact feature, and talking to multpile hosts.

      Maybe you have it configure to allways make multiple connections. I've never used the software so I don't know if there is such a setting. (only other reason I could see for this, is for sites that have a per-connection bandwidth set, you could then get around the admin settings).

    3. Re:Lots of programs do that by igrek · · Score: 2

      The main difference is that GetRight or others of the same type work on top of the TCP/IP, while the article talks about UDP-based protocol.

      The idea is to get rid of TCP/IP overhead, as far as I understand.

    4. Re:Lots of programs do that by reverius · · Score: 2

      I believe you were using GetRight wrong.

      The way it actually works is that you tell it to search for mirrors (using an ftp search site like ftpsearch.lycos.com) and it finds a bunch.

      Over my connection, downloading a linux distro usually goes about 60k/sec from any single site.

      Using the mirror searching and segmented downloading in GetRight, i've got 450k/sec from 8 sites at once.

      This was with no effort on my part, except for clicking on the file, after I configured GetRight.

  45. Swarmcast by Orasis · · Score: 5, Informative

    The "Math" they use is called Forward Error Correction (FEC) and is the same stuff that the Swarmcast distributed download system is based off of (http://sf.net/projects/swarmcast/).

    I am the creator of Swarmcast, and we at Onion Networks (http://onionnetworks.com/) already have a product that can provide file transfers just as fast as Digital Fountain, but ours is 3-5.5% more efficient and much much cheaper.

    On the open source front, we are working on the next generation of the Swarmcast technology which will combine parallel downloads over HTTP, Multicast IP, and UDP (for tunnelling through NAT). We have started work defining a standard for this system called the "Content-Addressable Web" and hope to see it implemented in everything from Apache to Mozilla.

    Please mod this up, people shouldn't be paying $150k for these types of technologies.

  46. A little math... by Cato+the+Elder · · Score: 2

    You can only solve a linear system of equations with two unkowns and two equations if the equations are independant. And you can't have more independant equations than you have variables. So your calculation is very optomistic. Imagine if I send you the following data:

    X+Y=4
    X+2Y=5
    2X+4Y=10
    2X+2Y=8

    With 50% packet loss (as opposed to a 50% chance of loss per packet*) you only get two useful equations 2/3 of the time. Of course, these boxes are expensive and can do a lot more number crunching than just linear algebra. But of course, I wouldn't really have expected a good comparison from a company whose veep of marketing said "FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes" Clearly all those game demo sites on the net need to get with the program.

    *Computing the chance of loss per packet gives:

    1/16 chance of getting all four, 100% success
    3/16 chance of getting 3 of 4, 100% success
    8/16 chance of getting 2 of 4, 66.666% success
    3/16 chance of getting 1 of 4, 0% success
    1/16 chance of getting none, 0% success

    or 28/48 is a little more than 4/7, slightly worse odds than if you always got 2 of 4.

    1. Re:A little math... by gowen · · Score: 4, Insightful
      your calculation is very optomistic. Imagine if I send you the following data:

      X+Y=4
      X+2Y=5
      2X+4Y=10
      2X+2Y=8
      You made an astonishingly bad choice of equations. If I send you
      X+Y=4
      X+2Y=5
      X+3Y=6
      X+4Y=7
      then you may find X=3, Y=1 from *any* pair of equations you recieve.
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    2. Re:A little math... by gowen · · Score: 2
      isn't sending just "X=3, Y=1" simpler? It seems to take 1/4 of the bandwith. You can even send it twice
      Well, yes. But (as I understand, and my maths is better than my networking) you then need ACKs to state each data block has been received (i.e. not a stateless UDP connection).
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    3. Re:A little math... by RovingSlug · · Score: 2
      isn't sending just "X=3, Y=1" simpler? It seems to take 1/4 of the bandwith. You can even send it twice
      Well, yes. But (as I understand, and my maths is better than my networking) you then need ACKs to state each data block has been received (i.e. not a stateless UDP connection).
      My understanding is that it's still stateful, just in the sense of the entire transmission rather than each packet. So, rather than being hardset to send those four equations, the sender just knows the process to create them. And, the sender keeps creating and sending until it receives that single ACK ("OK! Enough!" instead of "OK, got that one, next?"). If there's some overflow and the receiver gets a few extra packets, no biggie, it's still cheaper than the time and data that would have been spent handshaking along the way.
    4. Re:A little math... by gowen · · Score: 2
      you still have a [25%] chance of not getting all the data
      True, and if you send 'X=3', 'Y=1', 'X=3', 'Y=1' as four packets with 50% loss, you have a 7/16 (~44%) chance of not getting all the data (I think). So thats an improvement (modulo all the implementation/protocol overhead, I guess).

      You can reduce this to an arbitrarily small percentage by sending more linearly independent equations, but that takes more bandwith.
      Thats true of any algorithm you choose for encoding your data.
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  47. Tornado codes by srichman · · Score: 5, Informative
    This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples.
    Actually, they use Tornado codes (or a proprietary update thereof), an erasure code. That is, they use forward error correction to encode streaming data or a software distribution over a single (or multiple) client-independent streams of multicast. After a client grabs enough packets, it can reconstruct the source file.
  48. Raid 5 Errors...Disks by tonywestonuk · · Score: 3, Insightful

    I expect this system is a little like Raid 5, Used on Hard Drives. Eg, 5 disks, 1 goes down, you still have enough data to restore the failed drive.- This seams simalar to been sent 5 Million UDP packets, 1Million get lost on the way, however, you still can piece together perfectly what you wanted from the remaining 4Million.

  49. Re:Compression by hyoo · · Score: 2

    All the data doesnt need to come from the same server. I suppose it is like load balancing.

    If you have 20kbps to mirror A, and 80kbps to mirror B. Server A and B start spamming you with packets of whatever you are downloading. Then 20% of the total packets you recieve are from A, and 80% of the total are from B. Given all the partial packets you got, you can now reconstruct the porn DivX. This would be faster then getting 100% of the packets from B which would take 25% longer.

  50. It's called Forward Error Correction by mprinkey · · Score: 2, Informative

    There is nothing "proprietary" here. The techniques for reliable transmission of digital data over unreliable media has been a central area of EE research for at least three decades. The unreliable media is now UDP instead of broadcast RF or transmission lines.

    Line reliability in the "normal" sense is classified by bit error rates. Here, the analogous rating would be packet dropped per second. So, it seems like a straightforward application of FEC. It is useful for the reasons that they state. TCP transmissions look like this:


    Client: Request Data Packet #00001
    Server: Transmit Date Packet #00001 then wait for Client to say OK or Retransmit.
    Client: Request Data Packet #00002
    Server: Transmit Date Packet #00002 then wait for Client to say OK or Retransmit.
    ....
    Client: Request Data Packet #20343
    Server: Transmit Date Packet #20343 then wait for Client to say OK or Retransmit.
    DONE

    The FEC/UDP form would look like this:

    Client: Request Data FILE
    Server: Transmit FEC Data Packet #00001
    Server: Transmit FEC Data Packet #00002
    ...
    Server: Transmit FEC Data Packet #20343
    Client: OK...I got it. Stop Transmitting.
    DONE

    There is no per-packet dialog between server and client. This removes network latency from the transfer equation. Note that FEC over UDP does require redundant information to be transmitted, so there is no free lunch here, but it is certainly likely to be faster than any TCP/IP implementation.

    1. Re:It's called Forward Error Correction by ZxCv · · Score: 2

      But with TCP, it doesn't usually require per-packet dialog. It starts off per-packet, but then increases exponentially until packets have to be transmitted. So if the first packet goes through fine, it will send 2 before requiring dialog. If those go fine, it will send 4, and so on and so forth.

      In practice, the FEC over UDP may be faster than any TCP/IP implementation in certain situations, but when you add in the extra volume of bandwidth (ala this company's implementation), I think it becomes much less attractive.

      --

      Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;
  51. FEC Library by Orasis · · Score: 5, Informative

    Oh yeah, and you can download our completely patent-free and open source FEC library from here and build your own Multicast or UDP based download system very quickly (provided you get the flow control right :)

    --
    Justin Chapweske, Onion Networks

  52. Multicast by srichman · · Score: 4, Interesting
    Quite correct. This protocol does not sound at all TCP friendly [yahoo.com]. It needs some way of dynamically responding to network conditions to be that way.
    Well, "this protocol" is just multicast (from a network perspective). Though there have been research attempts to introduce pruning in multicast streams, it is inherent a non-flow controllable transmission protocol. If you take offense to the lack of "TCP friendliness" in multicast, then I suggest you complain to Steve Deering, not Digital Fountain.

    Multicast is congestion friendly in that it can inherently seriously reduce bandwidth consumption, but it's obviously not congestion adaptive. I think the easiest (and probably best) way to introduce congestion control in a multicast is to have multiple streams at different rates, e.g., you have five simultaneous multicast webcasts of a Victoria Secret show, and folks can join the mulitcast group with the video quality that best suits the bandwidth and congestion on their pipes. This idea works very well for the Digital Fountain-style software distribution: rather than having a server serving the same stream at 10 different rates, you can have 10 servers streaming their own unsynchronized Tornado encodings, and clients can subscribe to however many their pipes can support. With 10 streams of data coming at you, you can reconstruct the original data ~10 times as fast.

    1. Re:Multicast by BeBoxer · · Score: 2

      Yeah, you are right. This protocol would be perfect for multicast! People have been working on developing "reliable" multicast transmissions of data forever. If you try to do it with the usual TCP-style transfer it's an amazingly hard problem. Especially since one of the standard "requirements" of multicast is that senders don't have to know anything about the receivers! This limitation alone makes reliable data transfers almost impossible within the boundaries of standard multicast.

      Until this came along. All a sender needs to do is pump out the data continuously on a given group address at some network-friendly rate. Any receiver who wants the data just connects and slurps down data until it has the whole file. Then disconnect from the group and the routers prune the traffic off of your network. It's genius. Each listener only has to receive traffic for exactly as long as they need to get the whole file! Since the overhead is claimed to be only 5% or so, it's hard to imagine any other scheme being more efficient.

      On the other hand, I don't see much benefit for unicast connections. If their idea is to use their algorithms to allow them to send data faster than TCP without flowcontrol, they are going to piss off a lot of network managers. Their traffic is going to look an awful lot like a DoS attack. In fact, without flowcontrol in a lot of cases it will be a DoS attack. The article gives the case of somebody with a 32Mbps connection who only gets 0.5Mbps out of TCP. Well, if they decide "hey, I'll use 1/3 of my connection for this traffic" and start dumping 10Mbps into a network that was already bottlenecked, they can expect a call from a very upset network person who wants to know why they are melting down their network.

      There are also some lies about TCP in the article. TCP does not require packets to arrive in sequence. Well, a few old brain-dead stacks did. But no OS made in the last few years cares that much about out of order packets. If you have a high bandwidth connection but you can't seem to get much thruput one of two things is happening: you have buffers that are too small on one or both ends of the connections, or the network is really too congested to go much faster. In the first case, there are numerous fixes if you take the time to research them (although I'll admit it's a pain in most cases.) Here's a search for more info. The article even implies that this is the case they are trying to fix. "They had too much turnaround". Hmm. Sounds like latency. 32Mbps * (your round trip time) = suggested buffer size. You don't need to spend 100K on hardware, just tweak your TCP stack! Or run a 2.4 kernel which will do a pretty darn good job of autotuning your buffers on the fly.

    2. Re:Multicast by BeBoxer · · Score: 2

      No. Packet loss happens because the link is congested. TCP slows down because the link is congested. Blasting traffic onto a congested link without flowcontrol does not magically make the link less congested. It makes it more congested. Their algorithms don't actually avoid retransmissions. They just rename them. There isn't any fancy math that can help you avoid packets getting dropped.

      Let's take an extreme example of 50% packet loss. Let's also say that the data would fit in 100 packets. With TCP, the sender will have to send about 200 packets in order to get the 100 needed packets thru to the receiver. The same is true of the 'fountain' method. Just because the 200 packets are all different doesn't mean that the sender doesnt need to the send packets. Which is what counts.

      Now, with their method they might send the 200 packets faster than TCP would. But in the larger picture, that just screws everybody else. Look at the white papers on their site describing their technology. They give the example of somebody who has T3 (45Mbps) links, but the network in between them is suffering from 200ms latency and 2% packet loss (which is a shitload by the way). TCP, of course, runs slowly on such a network. As it should. They talk about using their tech to just firehose UDP packets out at 45Mbps, and hey! Your file will get thru in seconds instead of hourse! Which of course is BS. If you start slamming 45Mbps of un-flow-controlled traffic onto a congested link you will probably take it completely down. If you keep it up, you can expect somebody to eventually start filtering your traffic out because you are DoS'ing the network. At best. At worst, you can expect a visit from the FBI because your behavior is exactly the same as a script-kidding DoS'ing somebody, only you won't be bothering to forge your source address so you'll be easy to find.

      Don't get me wrong, their technology has some wonderful applications. But intercontinental traffic is sometimes slow because the links are congested! No fancy math is going to change that. Ignoring the fact that the link is full and blasting traffic into it without flowcontrol will only make it worse! They might think they are on to something really cool, but if it works at all it will only work for one person at everyone elses expense. If everyone stops using flow control and just blasts traffic out as fast as they can, performance will really start sucking.

      All of which they realize, which is why they are working on RFC's for flow control. And when those RFC's get approved, you can bet that the congestion backoff will look a lot like TCP because we have a pretty good idea of how TCP behaves on a macro scale. The whole spin of getting super fast network transfers is basically marketing BS, something which I'm sure the developers are aware of. The real advantages, which are the ones they are pursuing in RFC's, are reliable multicast and (almost) stateless data serving.

  53. All about Digital Fountain by TheSync · · Score: 5, Informative

    OK folks, here is the "real deal."

    Digital Fountain's core technology is called "meta-content (tm)". The meta-content engine produces packets based on the original content, such that if the receiver receives enough packets (just slightly more than the size of the original content), the original content can be recreated. The neat part is that it doesn't matter which meta-content packets you get. If you need to receive 10 packets, you can get 5, miss 5, get another 5, and it works. Or you can get 1, miss 10, get 9, and it works as well. As long as you receive some 10 packets from the "fountain," you can recreate the original content.

    Why is this cool? Several reasons. Digital Fountain claims that TCP connections with RTT of 200ms and 2% packet loss have a bandwidth limitation of 300kbps, no matter how fast the actual transport channel is. So you just go to town to full bandwidth with UDP to use up the entire channel, and use Digital Fountain technology so it doesn't matter which 2% of packets get lost, just as long as you transmit enough packets to make up for the packet loss.

    OK, why else is this cool? Imagine a Digital Fountain continuously transmitting meta-data on a multicast address. If you want to receive a file, just listen to the multicast address. It doesn't matter when you start listening, just as long as you listen for enough packets to recreate the original file. Multicast file distribution.

    Interestingly enough, Digital Fountain has also pioneered multicast on-demand streaming, but the real secret sauce there is something besides meta-content, but meta-content makes it easier.

    As some people have mentioned, you can use UDP with FEC to achieve some error correction. But meta-content can handle long burst errors, whereas FEC is only appropriate for short, random errors. You can literally unplug the ethernet, wait a while, and plug it back in, and you're still good to go with Digital Fountain, as long as you listen long enough.

    I should mention, DF has something called "Fair Layered Increase Decrease Protocol," or FLID, to keep their UDP burst from blowing away other TCP traffic on the network.

    For more information on the basic Digital Fountain technology, see: A Digital Fountain Approach to Reliable Distribution of Bulk Data.

    1. Re:All about Digital Fountain by mikeee · · Score: 3, Insightful

      Digital Fountain claims that TCP connections with RTT of 200ms and 2% packet loss have a bandwidth limitation of 300kbps, no matter how fast the actual transport channel is.

      First, 2% packet loss is terrible, even on a WAN.

      Second, 200ms is terrible latency, unless you're crossing an ocean.

      Neglecting packet loss (which ought to be neglectable, though admittedly isn't at 2%), your maximum TCP throughput will be (TCP Window Size)/(2 * Latency), or the bandwidth, whichever is more. That comes to about 1280kbps on that 200ms link if you aren't using TCP window scaling options, and much higher if you do.

    2. Re:All about Digital Fountain by hqm · · Score: 2, Informative

      You can easily use FEC for burst errors, you just use a cross-interleaved encoding on top of the basic encoding. That's what CD players do and they can eat a 4000 bit long burst error without a hiccup. And CD-ROM encodes another layer on top of CD Audio, and thus can sustain even larger burst errors.

    3. Re:All about Digital Fountain by BlueUnderwear · · Score: 3, Insightful
      As some people have mentioned, you can use UDP with FEC to achieve some error correction. But meta-content can handle long burst errors, whereas FEC is only appropriate for short, random errors.

      This depends on the parameters of your FEC algorithm. Most FEC algorithm do indeed divide the data to be transmitted into "slices" of n blocks, to which they add k blocks. If more than k blocks are lost per slice, you're SOL, even if enough extra blocks are available in other slices. However, there is a way around that: just tune your FEC algorithm so as to make n large enough that all of your file fits into one slice.

      The downside of this is that the larger the slice is, the more computationnally intensive it is. If the only thing you're concerned about are burst errors, just interleave your slices.

      You can literally unplug the ethernet, wait a while, and plug it back in, and you're still good to go with Digital Fountain, as long as you listen long enough.

      This is possible with run-of-the-mill FEC algorithms as well, as long as you put your entire file into a single giant slice.

      --
      Say no to software patents.
    4. Re:All about Digital Fountain by Arlet · · Score: 2

      2% packet loss is not terrible if you consider that most packet loss on WAN is caused by routers dropping frames because of congestions. TCP uses knowledge from these dropped frames to determine this congestion, and basically reduce its throughput to match the link's bottleneck. 200 ms latency isn't great, but it probably assumes crossing the ocean.

      Now, assuming the dropped packets are due to congestion at a bottleneck, there's no reason to assume that UDP+FEC can seriously increase bandwith. If people all start to pump UDP packets at T1 speeds into a WAN, serious congestion will occur, and some router is going to drop the majority of the packets.

      Of course, if only a single person is doing it, bandwith can be improved, but I bet you can achieve similar results by removing congestion control from TCP, and using big windows.

  54. NOT NEWS. by tqbf · · Score: 5, Informative
    Digital Fountain has been around, with product, for a long time. The technique they are building on for file transfer, has been around even longer. Their protocols are even IETF drafts/standards now.

    The concept of "sending equations" instead of data is extremely well-known. It's called Forward Error Correction (FEC). FEC is a very simple idea: take the source data and encode it with parity data so that you can reconstruct the original message from any N chunks of it. One form of FEC that even your stereo components might already do is Reed-Solomon encoding; you can look this up in CS textbooks. If you Google for "Luigi Rizzo Vandermonde", you'll find a fast, free C library for FEC that you can use in your own applications.

    Digital Fountain was started by Mike Luby, who is something of a luminary in information theory/cryptography. The kernel of their company's IP is "tornado codes", an FEC codec that is leaner and faster than competing codecs. The basis for their protocols, last time I checked, is FLID/DL and RLC. These protocols set up multiple channels (or "sources" if you must), each transmitting random chunks of FEC-encoded files.

    The drawbacks to FEC are that it can take a long time to encode data for transfer, which makes FEC hard for some real-time applications like mail, and that the amount of data transferred is going to be some percentage greater than the original file (owing to parity information). But the drawback to FEC file transfers protocols is much more significant: they aren't TCP-friendly.

    The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?

    More to the point, what's the benefit? There's obvious payoff for FEC in multicast, where backtraffic from ACKs and NAKs quickly overwhelms the group and kills performance. But in unicast-world, where we will all be living for the next decade, TCP and TCP-friendly forward-reliable protocols like SCTP already provide good performance.

    Slow news week at EETimes I guess. Or Digital Fountain just hired a PR firm.

    1. Re:NOT NEWS. by adadun · · Score: 2, Informative

      The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?

      Yes, their protocol uses TCP-friendly congestion control - read about it in their SIGCOMM paper.

    2. Re:NOT NEWS. by msouth · · Score: 2

      well, ok, but it was news to me. :)

      --
      Liberty uber alles.
    3. Re:NOT NEWS. by tqbf · · Score: 2

      The irritating thing is that scaleable FEC schemes have been used for reliable multicast for a long, long time, but Digital Fountain (and to a lesser extent Swarmcast) are gathering accolades for what is basically a simple refinement of a very well-known technique.

    4. Re:NOT NEWS. by msouth · · Score: 2

      But haven't you heard? "gathering accolades for what is basically a simple refinement of a very well-known technique" is all the rage. Except they usually spell "accolade" p-a-t-e-n-t.

      Wasn't criticizing your post, btw, just taking the opportunity to be funny.

      mike

      --
      Liberty uber alles.
  55. Droppage by srichman · · Score: 2
    You still need some form of flow control or rate limiting, otherwise a large percentage of the UDP packets are going to get dropped.
    The whole point of Digital Fountain's erasure encoding scheme is that it doesn't care how many packets get dropped on the floor; a client can reconstruct the original data just as efficiently with whatever packets it receives as if the stream had been sent at an optimal rate to begin with.
    Plus, you have the problem of UDP streams stealing bandwidth from TCP streams on a limited bandwidth link.
    This is a problem. You want to satisfy the people with the fat pipes, but not destroy all the little pipes and hose people's TCP windows. See my other post for a possible solution.
  56. Re:Uhh... my shit detector just went off by tzanger · · Score: 2

    For example, lets look at a situation where I need to tranfer data from the east coast to the west coast and my round trip time is 70 ms. If I have a 32 Mbps link, I can send 64K in about 2 ms. So I have to wait 68 ms until I send another 64K. This gives an effective throughput of less than 1 Mbps. Throw in the slightest amount of congestion and things go even further downhill real fast.

    Thank you for that detailled explanation. It seems that my shit detector is a little trigger-happy. :-) Your comment on the "magic algorithm" is spot-on. I could think of a way to pre-determine an algorithm which recodes 64k chunks of data by precomputing all possible 64k datastreams and then simply selecting the datastream which represents that 64k (or whatever size). Anyway the other comments in this article have already attacked the idea on its use of UDP and its lack of net-friendliness. I predict a number of angry customers who paid $300k for a pair of these and whos data is getting dropped/throttled like crazy by their ISPs or the ISPs in the middle. :-)

  57. from P2P to P2net? by gotan · · Score: 2

    Yup, reading a few answers here i'm already reading up on FEC (why didn't the EETimes mention this?). The nifty thing about this is, that it's also a nice way to store information in a distributed way (think freenet, gnutella, ...) as far as i understand it. This opens a whole new perspective on the whole P2P theme, since a piece of information wouldn't be stored on one host alone, and also because of the implied redundancy, meaning that the information would still survive in that network until a critical number of hosts would be disconnected. Also netload would be distributed on the sending end, opening some interesting ways for congestion control.

    --
    "By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
  58. Re:Simple working example by Junta · · Score: 2

    And the process you describe does nothing for showing benefit. Look, we can represent the three original packets using three different packets is a rather pointless endeavor, especially sending all four seems really silly. If you scale that principle at all, pure XORing doesn't work to save any bandwidth. You need a better example to illustrate how this would actually be useful. I don't think you properly understand what the article is getting at...

    --
    XML is like violence. If it doesn't solve the problem, use more.
  59. Re:Swarmcast by anticypher · · Score: 2

    shouldn't be paying $150k for these types of technologies.

    You haven't looked at the outrageous prices of Cisco equipment lately, have you? :-)

    Assuming these boxes are for accelerating multicast applications and preserving WAN bandwidth, then US$70,000 per box sounds competitive with Cisco. Put a large one of these boxes at the headend of a multicast, and one box for each LAN where there are a number of receiving clients, and there could be considerable savings in WAN usage.

    /. readers don't realize how expensive a WAN link costs, and how those costs jump in large multiples when a PHB demands more bandwidth for his pet project. Especially outside of the US where international WAN links cost upwards of $10,000/month for 2Mbit/second, and to add another 2Mbit/sec will cost another $10,000 plus a long wait for turn-up. If the PHboss absolutely has to roll out a multicast application immediately, I'd throw a handful of these boxes on the network, and not bother with buying more bandwidth for a while.

    the AC

    --
    Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
  60. I was a skeptic... by DaoudaW · · Score: 5, Informative

    For example, consider the transfer of a 1 GB file with an average Internet packet loss rate (2%) and global average RTT (just over 200 msec). For this case TCP has an inherent bandwidth limitation of 300 Kbps regardless of link speeds and thus will take more than 7 hours to transfer the file. A Digital Fountain server sending Meta-Content allows almost complete bandwidth utilization. With Digital Fountain, on a fully utilized 1.5 Mbps (T1) line the transfer of the 1 GB file will take about 1.5 hours and on a fully utilized 45 Mbps (T3) line the transfer time will be a little over 3 minutes.

    I was a Skeptic, but I'm now that I've read the Digital Fountain white papers: Meta-Content Technology White Paper and TCP White Paper, I'm a True Believer.

    I don't pretend to understand all the details, but the technology is based on the Luby transform which was invented by the company's founder. Essentially the content is broken into segments which are then randomly XORed to produce the metacontent. This provides redundancy since each segment of metacontent contains information from several of the original segments. Also the metacontent segments do not require sequential transmission.

  61. Re:Simple working example by Junta · · Score: 2

    Yes, but that doesn't work well in practice. One, by sending an extra packet, you're wasting bandwidth. Issuing an acknowledgement that the first three packets have been received would likely be too late to stop transmission of the fourth packet, otherwise, your complaint about negative acknowledgement being too slow becomes moot.
    For another, a large portion of transmission errors are not localized to a single packet, thus providing for a single packet to be in error (parity scheme) is essentially useless, as you'd probably need to send a negative acknowledgement anyway.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  62. Morpheus? by mcrbids · · Score: 2

    Morpheus allows for downloads from simultaneous sources.

    In fact, I've seen my system upload files that I was still downloading!

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
  63. Re:A good idea, but old by jmccay · · Score: 2

    You would still want some form of error checking to make sure the you got all the right data. I would hate to download somehting only to have a corrupt file.
    I think the big problem will be in the assembly and disassembly. Instead of the bottle neck being on the network, it will be on yoru box slowing everything else down.
    I really don't trust this. I would hope the use integer math. Otherwise, you can end up with some problems due to the fact that you can't represent some numbers IEEE floating point notation (which a lot of computers use to store decimal numbers).
    I don't see this taking off. There is too much possibility to mess up.

    --
    At the next eco-hypocrisy-meeting, count the private jets used to get to the meeting. Should be interesting to see that
  64. My take on the situation by athmanb · · Score: 2

    When you remove all the marketing talk from that article and the company website, what remains is essentially a protocol which:
    a) Uses UDP instead of TCP and implements its own proprietary flow control by sending the data multiple times
    b) Has no streaming capabilities whatsoever.

    Also, it's pretty evident that no matter how nifty their algorithm is, the data which needs to be transmitted before the file can be reconstructed needs to be at least as big as the size of the original file. Quite probably a lot more because of redundancy since the receiver can't acknowledge data.

    I second the originals' posters feelings about the protocol...

  65. They mail a ruler... by Christopher+B.+Brown · · Score: 2
    I believe this is the scenario where they take the information, compress it, thus achieving a Very Large binary number, let's call it X.

    They then find the power of 2 that is just larger than that. That's Y which is equal to 2 ^ N

    They then take a ruler, 1 foot long, and record on it two pieces of information:

    • The power of 2 that they're using, N
    • They then place a mark at the appropriate location to indicate where the location X / Y is, in feet.


    Insert ruler into padded envelope, and mail...

    Or was this just from some old science fiction story???

    --
    If you're not part of the solution, you're part of the precipitate.
  66. Isn't this what PAR is all about? by freeweed · · Score: 2
    Maybe I'm missing my mark here, but with multipart rar archives, this new PAR feature sounds just like what you're describing. As long as you snag the correct number of files, it doesn't matter which ones you get.. you can recreate the source.

    Or am I completely off my rocker here?

    --
    Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
  67. Is this like Usenet .PAR files? by ScottBob · · Score: 2

    Lately in the binaries newsgroups, there has been increasing use of .PAR files. If a large number of .RAR or .ACE files are downloaded but some are incomplete or corrupt, just so long as you have an equal number of .PAR files as there are missing/corrupt files, the original data can be reconstructed, no matter which of the original files are missing. An explanation and a PAR file reconstruction program can be found at:
    http://www.disc-chord.com/smartpar/

  68. Swarmcast by mvw · · Score: 2
    Mirror sites enable client requests to be serviced by any of a number of servers, reducing load at individual servers and dispersing network load.

    But where to get the mirror sites from? For example if a new Star Wars trailer hits the net?

    An interesting idea here is to distribute the server workload onto the clients downwards:
    Swarmcast is such a protocol. It uses forward error correction as well, and I believe some of the guys whose names were mentioned in this discussion, are involved in this as well.

    Any expert who can comment on this one?

  69. Re:dropped packets.. no problem? by Happy+Monkey · · Score: 2

    There are m unique packets, but any group of n of them will decode to the file where m is less than n. So if a packet is dropped, it's OK, as long as less than m-n are dropped. Essentially, there is redundant data, and the only response that needs to be sent is the "Ive got it" response.

    --
    __
    Do ya feel happy-go-lucky, punk?
  70. Re:Archie .vs. Veronica by Medievalist · · Score: 2

    /.
    Well, RECENTLY, Veronica was mouse-driven.

    Of course, if you weren't a young pup still wet behind the ears, you'd remember that Archie in _Archie_Comics_ had that checker-board thingy on the sides of his head, and Veronica didn't. And I've heard that Archie was latently homosexual, and Veronica wasn't (that's why he was always chasing the unattainable Veronica instead of scoring with hot-to-trot Betty).

    I'm sure Bhil will correct me if I'm wrong.
    --Charlie

  71. Bittorrent does that! by justin_squinky · · Score: 2, Interesting

    Bittorrent
    does that! It's not really trying to be a napster or gnutella but a way to allow people to host a high volume website without having a lot of bandwidth. It works seamlessly with mozilla and IE. It's also quite fast unlike the anonymous p2p networks.

  72. Maybe like this... by DrCode · · Score: 2

    Client: Requesting 'kword'.
    Server: Kword binary: 1.2Mb; Kword source .5Mb.
    Sending source.
    Client: Source received. Unpacking. Compiling.
    Done.

  73. Re:UDP? Not always good by jidar · · Score: 2

    Uh.. you mean ICMP I think. Widespread blocking of UDP isn't something I've noticed. That would suck considering 90% of the games people are playing online are UDP.

    --
    Sigs are awesome huh?
  74. If there's redundancy, there has to be overhead by Animats · · Score: 3, Insightful
    The article claims that there's no overhead added, in terms of extra data, to transmit data in this way. That has to be wrong. If you add redundancy, you have to add bits. Maybe not very many, but some. Basic thereom from Shannon.

    I'd guess they have to send about 2x as much data as the original for this to work. Maybe they're applying a compressor (like gzip) up front, and taking credit for that.

    As others have pointed out, there are lots of other schemes in this class. Bear in mind that this is a protocol for broadcast and multicast, not general file transfer.

  75. Junk or Genius? by Spazmania · · Score: 2, Interesting

    This could be true genius or mediocre junk depending on the details. That article wasn't clear: Does it have to be N specific packets (in any order) or can it be any N of the transmitted packets?

    N specific packets is, well, nothing special. The idea of using negative acknowledgements instead of positive acknowledgements and retransmitting only the missing pieces is nothing new. It was used extensively in pre-Internet days on dedicated connections. Ever heard of Z-Modem? Its been avoided on the Internet because it makes the congestion control problem very difficult. Translating it into symbols is really just saying that they also compress the file first. Whoop de doo.

    On the other hand, if it can be ANY set of N of the transmitted packets, well, that's downright incredible. The applications for such a technique are boundless... Everything from finally making multicast viable and desirable to latency and congestion indifferent file transfers to ultra-reliable offline storage.

    How would you like a web server that can serve a T3's worth of clients off of a T1 and do it in such a way that the smart web browser can pre-cache data from the server in anticipation of the reader's next click, realtime adaptive based on the documents currently in multicast transmission to somebody else? Oh yeah, the web server can be on the Moon and respond to you as fast as if it was across the street. Genius.

    So which is it? Any set of N or a specific set?

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  76. Might I suggest rsync... by cduffy · · Score: 2

    ...as a suitable high-level system for those concerned about data corruption in file transfer? (The P2P system Electronic Donkey 2000 also incorporates a great deal of high-level error checking -- discarding and retransmitting file blocks whose checksums fail and even those which were corrupted on disk since the beginning of the transfer in the event of a restarted transfer).

  77. Re:Compression by Eil · · Score: 2



    No kidding. I read the entire article and didn't see just where their innovation lies... behind all the tech jargon and trademarked methods, it just sounds like fancy compression to me. I don't really understand the math behind this, but from what other people are saying, there doesn't seem to be much that hasn't been thought of before.

    Something definitely smells fishy when the CEO is saying that FedEx is more reliable than FTP... I've downloaded more 20+MB Linux kernels than I can even count, and not one of the resulting files has ever had a CRC error. I have, however, known many people to have problems with packages they've shipped or recieved via FedEx.

    I think the reason chip designers would FedEx their designs is more for security reasons (certified mail and insurance, etc) than reliability ones.

    This article just reeked of boneheadedness, IMHO.

  78. RFCs by Cato+the+Elder · · Score: 2

    There's a difference between "stick with what we know works" and "Stick with the RFCs." RFCs are dynamic. The TCP and UDP ones have both gone through several revisions. The difference is if you stick with the RFCs you have a bunch of engineers from a bunch of different companies, schools, etc. look at your protocol and make sure it can be implemented from the spec. They also try to balance out its effect on internet traffic as a whole.

    UDP is a perfectly legitimate method of moving data, especially if you can tolerate loss. It has been the general position of the IETF for years that you shouldn't reimplement retransmission--that's TCPs job. However, from descriptions it sounds more like this is a smart form of connection blasting, which isn't reinventing the wheel but can be hard on networks.

    Speaking of that, it isn't the network that breaks because of UDP--it's your program. If you start sending large amounts of UDP data and chewing up the bandwith on my router, according to the spec I can start not even attempting to process the packets. This is the problem with blasting protocls--by taking up more bandwith than TCP for the same data transfer rate they invite administrative blockage.

  79. This was done over a decade ago. by The+Panther! · · Score: 2

    What's the big deal? Direct connections did this on BBS's a long time ago. Simply, streaming all the data without flow control (Xmodem and early YModem required ACK/NAK), and let the client request the packets that it is missing leisurely. ZModem was the first to allow out of order delivery, from what I recall, and it revolutionized transfers with maximized bandwidth consumption.

    The only thing this particular implementation allows is connecting at an arbitrary time and listening to a continuous loop of packets until you get them all. ZModem could easily be modified to do exactly this, except with checksum data per packet rather than XORing chunks of packets or using symbolic representations. I'm not read up on FEC, but if you're transmitting already compressed data (near the bit distribution of entropy), no alternate representation is advantageous than sending the raw data.

    --
    Any connection between your reality and mine is purely coincidental.
  80. Re:Integer math -- even simpler than that! by RackinFrackin · · Score: 2, Informative

    Yes. CDs encode data using two Reed-Solomon Codes. Each sample (44,100 per second) records two amplitude measurements (one for each channel), as a 16-bit words. Thus there are 4 bytes/sample. Measurements from 6 consecutive samples are grouped together, and encoded using the Reed Solomon code RS(2^8,5). This adds 4 bytes of redundancy to the 24 bytes of data. After that, the 28 resultant bytes of data are interleaved, then encoded in a a shortened Reed-Solomon code that adds another 4 bytes of redundancy. Then one more byte is added that is used for some type of control purpose (i don't know exactly). After that, a translation is used to make the physical reading/writing more feasable, three bits are added for some reason after the translation, then a 27 bit word is added for syncing purposes.

    Overall 6 "ticks" of stereo music contain 196 bits of data, which is expanded to 588 bits written to the disc. The end result is that any error burst of 8871 or fewer bits of data on the disc can be corrected.

  81. Freenet by mOdQuArK! · · Score: 2

    Actually, if you think about it, this kind of approach would work pretty well for an encrypted/distributed sharing network like Freenet, where individual nodes of the may or may not be available. You can take all of the pieces of the given file & spew them to lots of different parts of the network, then a client can just go around asking different nodes in the network for any pieces of that file until the client has collected enough of the pieces to form the whole.

  82. AAAAAGH!!!! by peccary · · Score: 2

    why do people have to take a technology which is perfectly reasonable and useful within a certain problem domain, and turn it into snake oil?

    I read that whole article, just praying I would see the name "Shannon" or the word "congestion" in there, or even TNSTAFL, but nooooo.

    What these guys are doing is absolutely wonderful when

    - the bandwidth-delay product is greater than the size of the data to be transferred

    - upstream bandwidth is far more expensive than downstream bandwidth

    - the receiver is stealthed

    - the communications link is half-duplex.

    - the receiver has cheap, fast*, temporary storage available. (faster than the network, anyway).

    - occult knowledge is being transmitted, and the parties wish to have some deniability

    What they are doing is absolutely horrible when the network links are shared with other people -- unless they have some other means of congestion control.

    That's not unreasonable. Couldn't somebody just say as much?

  83. Congestion control? by Cato · · Score: 2

    This sounds like a problem for congestion control - UDP apps frequently don't have TCP-friendly congestion control, so they can get an unfair share of the network's bandwidth, affecting other applications that use TCP or TCP-friendly UDP protocols. FLID sounds like their form of congestion control, but if they are doing better than TCP it's quite likely it is more aggressive in its network impact. However, handling long burst errors sounds like a useful thing - of course, this can mask interesting errors in network configuration, e.g. lack of FEC on a wireless/satellite link, or lack of Frame Relay traffic shaping on a router with a FR link (this happened to a customer yesterday...).

  84. Re:Redundancy by BlueUnderwear · · Score: 2
    And how is this a win over just sending the same packets multiple times?

    Sending the same packet multiple time leads to a higher volume (you at least double the volume), for actually less fault tolerance.

    Suppose you have a file made up of a hundredblocks, and that we have a 5% probability of packet loss.

    In case you transmit each packet twice, you have for each packet a 0.05 * 0.05 probability = 0.25% that you loose both. This may seem small, but if you calculate it, you'll see that the probability of that not happening for any of the 100 packets is quite small: (1-0.0025)^100 = 77.8%. So, you have roughly one chance in 4 of not transmitting your file in its entirety. And to achieve this modest result, you needed to transmit 200 packets, rather than 100.

    Now compare this to FEC: you transmit a number of additional packets which can be used to compensate for any loss, not just for its companion packet. Just adding 20 more packets brings the total probability of a successful complete transmission to 99.9999% (that's the probability that less than 20 packets are lost out of 120, given an individual loss probability of 5%). So you transmit less packets (120 rather than 200), and you get a higher assurance of overall transmission success.

    --
    Say no to software patents.