UDP + Math = Fast File Transfers

Kazaa does that by DOsinga · 2001-12-13 01:55 · Score: 2, Informative

User would go to download a game demo or something, receive pieces from several different places, and knit them together?

The file sharing networks based on fasttrack technology do that. You download a movie or game from different users at the same time. Kazaa stitches it back together.

Re:Kazaa does that by zachhendershot · 2001-12-13 01:56 · Score: 2

A program by the name of Download Accelerator does that also. It splits up the file from the same location and downloads different chunks at the same time. Worked pretty well from my limited experience.
Re:Kazaa does that by Kingpin · 2001-12-13 02:12 · Score: 1

To what purpose? I'd say that bandwidth is the limiting factor in 9 out of 10 cases, no? So unless the site you're downloading from has a bandwidth/connection policy, the accelerator doesn't help you a whole lot.

--
Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
Geocrawler error message.
Re:Kazaa does that by autocracy · 2001-12-13 02:16 · Score: 2

Yes, but in many cases it's the sending end that bottle necks. Major FTP site have some serious bandwidth, but during the day it gets split pretty rough. Get it from many sending ends, and suddenly you're pulling all you can handle.

--
SIG: HUP
Re:Kazaa does that by Anonymous Coward · 2001-12-13 02:24 · Score: 0

You've obviously never tried to download the latest Counter-Strike update on the day it was released then. :-) You can be sitting on a DS-3 but if the sites are overloaded you're still only going to get it at a couple of kilobytes a second. Stitch together 10 or 20 of these and it is reasonably fast.
Re:Kazaa does that by Lozzer · 2001-12-13 02:46 · Score: 1

If you are sharing the bandwidth at your end then it can help promote your download to the detriment of others at your site. Unless your site has some fairer load balancing than a standard TCP stack - where each connection just grabs what it can (this tends to balance out over all connections - not over a quota per machine).

--
Special Relativity: The person in the other queue thinks yours is moving faster.
Re:Kazaa does that by Stinking+Pig · 2001-12-13 02:58 · Score: 1

I don't think so, no. A single TCP session will reach a "saturation point" when crossing a WAN -- I've been doing some modeling and tests that suggest it's around 40 Mbps for a North American coast to coast application (assuming 1% of packets is dropped and average latency of 90ms). The saturation is caused by the impact of latency on acks and the effect of dropped packets on retransmits.

Anything that decreases the amount of data to transmit helps a little, but if you want to replicate a delta of 100 Gbytes decreasing the data to transfer by a few percent is nothing to write home about -- you need multiple streams, which introduces some ugly complexity when it comes to buffering and reconstruction.

UDP doesn't warm my cockles.

--
"Nothing was broken, and it's been fixed." -- Jon Carroll
Re:Kazaa does that by Pooua · 2001-12-13 11:42 · Score: 1

Slashdot: "User would go to download a game demo or something, receive pieces from several different places, and knit them together?"

DOsinga: "The file sharing networks based on fasttrack technology do that. You download a movie or game from different users at the same time. Kazaa stitches it back together."

Pooua: More to the point of the sarcastic Slashdot comment, *Napster* does that.

--
Taking stuff apart since 1969 (TM)
Re:Kazaa does that by asldihf · 2001-12-16 15:26 · Score: 1

Digital Fountain's transport, in discussion here, takes a data set (ie 01010100001010) and creates a continuous flow of unrelated linear equations describing the dataset. Then it packs these random equations in UDP payloads and transmits to the receiver. Each packet is non sequenced and mutually exclusive which makes loss a moot point since the next packet coming down the pipe is just as valuable as the one lost. When the receiver gets enough packets, it uses the equations it has gathered to 'solve' for the original dataset. Also, the overhead is minimal - if it takes 1000 packets to send a file in TCP it takes 1050 with Digital Fountain.

getright by awptic · 2001-12-13 01:55 · Score: 4, Informative

The name of the program michael is referring to is called getright, which can connect to several known mirrors of a file and download seperate fragments from each.

Re:getright by poot_rootbeer · 2001-12-13 06:15 · Score: 1

...and it has nothing to do with the original poster's submission.

Honenstly, is it too much to ask that if the editors MUST insert their own comments to a story submission, they at least try to make them somehow related?

A good idea, but old by Marx_Mrvelous · 2001-12-13 01:56 · Score: 1

This is a good idea, and pretty natural. But it isn't anything new. There are many problems to overcome, not the least of which is managing all the TCP/IP conenctions and doing the decompression/assembly.

Of course, when a 1GHz CPU costs about $90, I guess we can afford CPU-heavy file transfers.

--

Moderation: Put your hand inside the puppet head!

Re:A good idea, but old by cloudmaster · 2001-12-13 02:52 · Score: 3, Informative

Since it doesn't use TCP, I'll bet it won't have any problem handling the TCP part of "TCP/IP connections"... The networking end sounds really simple - send the amount of data to expect, wait for confirmation, send all of the data once *without* waiting for confirmation until it's all been sent. That's a lot easier than handling the overhead of tcp, which is the whole problem they're trying to solve.

BTW, it helps to read the article before posting "insightful" comments. :) There's a nice little demo at http://www.digitalfountain.com/technology/coreTech nology.htm
Re:A good idea, but old by Anonymous Coward · 2001-12-13 04:32 · Score: 0

It's sounds really cool, and quite affective. I understand the principle how it works, but how do you go implement the 'write reciepe' and the 'bake' part. That seems like a though problem.
Re:A good idea, but old by jmccay · 2001-12-13 05:16 · Score: 2

You would still want some form of error checking to make sure the you got all the right data. I would hate to download somehting only to have a corrupt file.
I think the big problem will be in the assembly and disassembly. Instead of the bottle neck being on the network, it will be on yoru box slowing everything else down.
I really don't trust this. I would hope the use integer math. Otherwise, you can end up with some problems due to the fact that you can't represent some numbers IEEE floating point notation (which a lot of computers use to store decimal numbers).
I don't see this taking off. There is too much possibility to mess up.

--
At the next eco-hypocrisy-meeting, count the private jets used to get to the meeting. Should be interesting to see that
Re:A good idea, but old by ironfroggy · 2001-12-13 11:41 · Score: 1

Actually, they do cost ninty bucks. Less even. 89.99 you can get a 1.1 ghz. and I'm looking at slightly dated prices.

--
Question
http://www.ironfroggy.com/

Other Applications by Keeper+ofthe+Keys · 2001-12-13 01:56 · Score: 2, Informative

GetRight uses multiple sites to download from and then pieces them back together.

http://www.getright.com

Re:Other Applications by DaCool42 · 2001-12-13 09:06 · Score: 0

What they are talking about is eliminating the need to verify the arrival of packets, not simply download parts of a file from multiple sites.

--

----
All of whose base are belong to the what-now?

And cheap, too! by Tsar · 2001-12-13 01:57 · Score: 5, Funny

The Transporter Fountain sits alongside a switch or router, and one Transporter Fountain is needed at the sending and receiving ends of a connection. Prices will range between $70,000 and $150,000.

Oh, boy, I'm gonna stop by CompUSA on the way home and grab one of these.

Re:And cheap, too! by "Zow" · 2001-12-13 03:37 · Score: 2

No kidding - It's kind of like a FAX machine: That'll be about $70,000 if you just want to send or receive, or about $150,000 if you want to send and receive, give or take $10k.

-"Zow"
Re:And cheap, too! by Anonymous Coward · 2001-12-13 05:34 · Score: 0

Hah, I'll wait until the Sunday paper comes out, and they have a FREE Transporter Fountain ($70,000 retail, with $70,000 rebate) offer. Then I'll go down and find out they only had two in stock, and the employees bought both of 'em the second they arrived. No wait, they NeVeR do that, right? Right...

edonkey does the same i think by Anonymous Coward · 2001-12-13 01:57 · Score: 0

The Edonkey Software does the Same i think for download Movies and other big files, does not work perfect so could we perhaps Spawn an Open Source Project with an open Protocal for this ?

i believe it could be nice for apt-get debian install's
or distributing cd images of linux software

Prior Art by cperciva · 2001-12-13 01:58 · Score: 0, Flamebait

Here

--
Tarsnap: Online backups for the truly paranoid

Re:Prior Art by Anonymous Coward · 2001-12-13 05:15 · Score: 0

Prior art? This has nothing to do with patents, nerd.

Vectors... by CoolVibe · 2001-12-13 01:59 · Score: 2

Those would transfer the fastest, since they don't consist of bitmapped data, but just the instructions to create the image.

I wonder what equations are used to convert raw unpredictable streams of data to formulas, and how come that the formulas used aren't bigger than the sent packets themselves? They mentioned XOR, but that just sounds silly, because XOR does nothing with data except do some reversible equation on them which does neither shrink or grow data.

Does anyone have more info? It does sound interesting though...

Re:Vectors... by beff · 2001-12-13 02:05 · Score: 1

ISTM that this technology would only work when there is very little information per byte in the original data. Text, for example, has a little more than one bit of information per byte. That is why compression functions work so well, they compress the actual information. How would this technology fare on efficiently compressed data or data that appears truly random (as well encrypted data is suppose to appear)? All of the Information Theory that I learned in college indicates that there is a minimum number of bits that represent any information. Once compressed to that point, you can't go any further.
Re:Vectors... by CoolVibe · 2001-12-13 02:12 · Score: 3, Insightful

> Theory that I learned in college indicates that there is a minimum number of bits that represent any information. Once compressed to that point, you can't go any further.
Exactly. This is also the point where a equasion to represent the data is going to end up bigger than the data its trying to send. But it depends on the algorythm used too. If the data may be sent out of order, one could try block-sorting and then compressing (like bzip2 does), but since this is UDP, out of order packets will be dropped or either not delt with (I think).
DISCLAIMER: I am not a protocol god, nor am I trying to be. Just spouting my views :-)
Re:Vectors... by hburch · 2001-12-13 02:21 · Score: 5, Informative
Consider the following (almost certainly bad, but workable) scheme:
- Convert a chunk of the file into an order-k polynomial (use the coefficients of the polynomial to encode the chunk)
- Send the evaluation of the polynomial at several distinct locations (more than k+1).
- Receiver gets at least k+1 packets.
- Using math, it recreates the original polynomial, and thus the chunk.
Please note that I'm not saying this is a good scheme. It is just an example one, and one that doesn't detail the chunk polynomial conversion, which would be very important. There are several papers describing schemes where people have actually worked at making them tenable.

Modulo compression, if you want such a system to require only receiving k packets (although you send more than that), the sum of the size of the k packets must be at least the size of the original file (otherwise, you could use such a scheme to compress the file).
Re:Vectors... by Izmunuti · 2001-12-13 02:33 · Score: 1, Informative

It sounds sort of how RAID (5?) works. Take the original data, expand it with some redundancy and then splat it across 5 drives. Any one drive goes down the data from the other four can be combined to recover the data. The data is "expanded" 5:4. The math to do the recover involves a ton of XOR operations. Some hard drives have nifty hardware support for this to speed things up.

I bet these guys have something like that: add a bit of redundancy, split things up, blast it out UDP to the destination. The receiving end puts it back in order and tries to recover from missing packets by using the XOR operations. It takes more bandwidth but still may be faster than FTP since it doesn't spend most of it's time doing handshakes.

I don't get why the box costs 70-150 kilobucks though. Yikes.
Re:Vectors... by Anonymous Coward · 2001-12-13 03:07 · Score: 0

Sounds to me like they are doing the equivalent of uuencoding a file,
turning off flow control on the line,
and then cat-ing.
leesssie.... when did we all used to
do that?

Or even better... out of order packets, sliding windows, encoded redundant data?
can somebody say C-kermit?
Just a thought.

This whole idea is lame. TCP is great, yeah FTP isn't the greatest, but do we care?
scp, rsync, http, are all "better" protocols in this respect.

If your throughput sucks,
you are running shitty hardware.
Re:Vectors... by JDawgX1zN0z · 2001-12-13 04:16 · Score: 1

I believe you're referring to binary Huffman codes? However, this is only one form of compression. Consider that a Pseudo-random number generator with only 3 variable integers (a multiplier, a modulus and a seed). These 3 integers can generate a unique sequence of pseudo-random bits of information. The trick is to reverse the sequence to only 3 bits. The method described here doesn't use PRNG, but instead uses a system of linear equations and solves for the variables as the data symbols.

I think you can represent large amounts of data in less and less information, given large enough computing power and large enough mutual client/server databases.
Re:Vectors... by nomadic · 2001-12-13 04:27 · Score: 2

You mean you can't compress a megabyte file to one byte by pkzipping, the pkzipping the zip file, then pkzipping that file, etc.? Damn, there goes my Nobel, really thought I had a made a breakthrough with that algorithm.
Re:Vectors... by sketerpot · 2001-12-13 04:40 · Score: 1

Damn, there goes my Nobel, really thought I had a made a breakthrough with that algorithm.

So did I, once. :-)
Then I found out about how compression works, after pkzipping a zip file and discovering that it became bigger. Big disappointment. I guess you just have to get it through your head that computers aren't magic.
Re:Vectors... by Anonymous Coward · 2001-12-13 04:41 · Score: 0

I don't think they care about size being smaller or bigger. I think they care that it's smaller or same size (or maybe even a little bit larger ) than every piece of information sent forth and back by the ftp-protocol for that file. Also since there is no waiting for round-trips and no 'windows' it should be quite fast even if its bigger. Only thing the client care about is if it has got enough diffrent algorithms to regenerate the data.

On a sidenote: Seems like the server floods the network if ignorant.
Re:Vectors... by Anonymous Coward · 2001-12-13 04:44 · Score: 0

This is like the "PAR" files in alt.binaries.* newsgroup which is based on the RAID idea. Due to hight volume, the regular ISP might suffer from file loss. If you have enough of the data files + PAR files, you can reconstruct the distribution files.
Re:Vectors... by evanbd · 2001-12-13 04:50 · Score: 2

a far faster scheme would use flat surface in n-space. basically, instead of a curve on the xy plane, if you need to send 4 numbers, you use w = ax + by + cz + d, a-d being the numbers. send points on the surface (at least 4) and the surface can be reconstructed. Not fundamentally different, but you try solving a 27th degree polynomial equation, even with a computer...

Oh, and your scheme actually only needs the original number of packets -- order k polynomial stores k+1 numbers (order 2 is ax^2 + bx +c, gives three numbers; you need three points to recreate. three equations, three unknowns)
Re:Vectors... by AYeomans · 2001-12-13 05:10 · Score: 1

But it's quite simple to compress any file to 50 bits or so. Let's say there are 10^10 computers in the world (more than one each), with 10^5 files on each. So there are only 10^15 files in the world. As 10^15 ~= 2^50 you only need 50 bits to represent any file in the world.(:-)

Shame about the amount of directory storage needed in this scheme. Mind you, Google's getting close!

--
Andrew Yeomans
Re:Vectors... by gowen · 2001-12-13 06:58 · Score: 2

Not fundamentally different, but you try solving a 27th degree polynomial equation, even with a computer...
Nobodies suggestion the data is hidden in the roots (zeroes) but the coefficients. Whilst finding the roots of an order 27 polynomial is hard (and Galois theory says "forget it", at least if you want exact solutions), interpolating one through 28 data-points is easy...

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Vectors... by jtdubs · 2001-12-13 07:13 · Score: 2

Let me get this straight, we use the bytes of data as coefficients in a huge polynomial. The number of terms in the polynomial equals the number of bytes in the message, and we call that k. We then sample the polynomial at k+1 various places and send the samples to the receiver. It reconstructs the polynomial and hence the data.

If we have k coefficients to the polynomial, why would we send k+1 samples rather than just the k coefficients. This would alleivate the need of the receiver to reconstruct the polynomial and would have one less piece of data to send.

Hey, wait a minute, if the k coefficients are the actual data, and that's what we are sending, then we are just sending the data.

Wasn't that easier AND smaller than the polynomial idea. :-)

You should wander over to sci.crypt and sci.crypt.random-numbers. Maybe you can convince them that this method will let you compress random data also.

To elaborate, here's the general problem you are running in to:

With any sufficiently random data, the size of the polynomial tends to be equal to the size of the data and hence doesn't help you. In fact, with any pattern you can find in the data, it will tend to take as much or more space to encode what pattern you found than just to encode the original data. That's the nature of the beast.

Justin Dubs
Re:Vectors... by gowen · 2001-12-13 07:28 · Score: 2

if the k coefficients are the actual data, and that's what we are sending, then we are just sending the data.
What you say is true, but you've missed the point. If you send the k polynomial coeffs and one gets lost, I have to ask you explicitly for the dropped one. If you send k+1 sampled data points and any one gets dropped, I can still reconstruct the data.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Vectors... by Mr+Z · 2001-12-13 07:28 · Score: 1

They never said anything about compression. Their technology is all about eliminating the throttling effect of TCP acknowledgements on a long haul high-bandwidth link. You can only grow TCP windows so large, and with TCP slow-start, only so fast.

I once saw an article on using TCP for interplanetary work, and they showed that RTT was the bandwidth limiter (bigtime!) due to how the protocol is constructed.

These "Fountain" guys are not about compression. They're about sending XOR blocks to fill in gaps, doing essentially blind-retransmits until the other end says "Ok, I got it all now!" Ick. The XORing bit just apparently helps reduce the number of needed "proactive retransmits."
--Joe

--
Program Intellivision!
Re:Vectors... by EdMcMan · 2001-12-13 09:16 · Score: 1

Can we say... postscript?

Compression? by mi · 2001-12-13 02:00 · Score: 3, Interesting

In essence, is not this the same as file compression? The amount of information is the same
(for those, who remember, what Bit is). It is just, that the usual one character per byte is awfully wastefull. Which is why the various compressors are so effective.

Add a modern data transfer protocol and you may
get some start up money :-)

--
In Soviet Washington the swamp drains you.

Re:Compression? by s20451 · 2001-12-13 02:20 · Score: 4, Informative

In essence, is not this the same as file compression? The amount of information is the same (for those, who remember, what Bit is).

It is more than merely compression. The received data is compressed, which saves transmission time, but this technology is already well known (and the company isn't claiming a compression rate better than entropy, or anything else silly). The innovation here is the elimination of acknowledgement or ARQ packets. I'm speculating here, but it looks like they are encoding the data by transforming a file into a huge "codeword" -- when the codeword is transmitted, the receiver waits for enough packets to correctly decode the codeword, which results in the recovery of the file. There's no need for ARQ or TCP because transmitting extra codeword elements will automatically correct any errors incurred in transmission.

--
Toronto-area transit rider? Rate your ride.
Re:Compression? by bpowell423 · 2001-12-13 02:57 · Score: 1

not quite. You're right about flodding the network, but the algorithm doesn't care about the order of the packets, or even if it gets them all.

compression? by 4im · 2001-12-13 02:00 · Score: 1

How is this different from a nifty compression and transmitting slightly differently?

To me, this sounds like a mix of compression and protocol, not necessarily that groundbreaking.

If it works, cool. But I guess it won't be that efficient on that old 486 Linux router...

Flow Control by Detritus · 2001-12-13 02:00 · Score: 3, Informative

You still need some form of flow control or rate limiting, otherwise a large percentage of the UDP packets are going to get dropped. Plus, you have the problem of UDP streams stealing bandwidth from TCP streams on a limited bandwidth link.

--
Mea navis aericumbens anguillis abundat

Re:Flow Control by Omnifarious · 2001-12-13 02:20 · Score: 3, Interesting

Quite correct. This protocol does not sound at all TCP friendly. It needs some way of dynamically responding to network conditions to be that way. Even something so simple as doing an initial bandwidth test, then rate limiting the UDP packets to 90% of that would be a big help, though for a large file that would still create a lot of congestion problems.

Does anybody know if IPV6 has any kind of congestion notification ICMP messages so stacks can know to throttle applications when the applications are blasting out data that's congesting the network?

--
Need a Python, C++, Unix, Linux develop
Re:Flow Control by M100 · 2001-12-13 02:27 · Score: 2

It doesn't matter if you lose data in the stream. You actually send more data than the original file - but the receiver has to receive approx the same amount of data as the original file (about 5% if I remember correctly). Thus you can send 100kbytes (for a 50k file) but the receiver only needs to get half the packets - so loss is not a problem.
Think about simultaneous equations. If I have 2 unknowns I can solve it if I have two equations. Now imagine that I send you 4 equations, each with the two variables. You only need to receive 2 of the equations to be able to reconstruct the original two pieces of data.
The other neat thing about this is that you can multicast traffic - and each receiver can start listenin when it wants - so if a receiver starts listening halfway through you can still get the whole file!
Re:Flow Control by DrStrange · 2001-12-13 02:29 · Score: 1

TCP has some congestion control already, fast retransmission followed by additive increase-multiplicative decrease...but the problem with this idea when I read it is that using UDP you have no guarantee that the "equations" are getting to the receiver. Maybe I missed something but it seems to me that if an equation is lost there is no way to complete the information transfer and no way to tell the sender "hey, I didn't get it!"
Re:Flow Control by Omnifarious · 2001-12-13 02:33 · Score: 5, Interesting

The 'equations' are broken up into pieces such that if you recieve any N pieces, you can reconstruct the entire file. It's like how some key sharing schemes work. Like Publius for example. Any N 'shares' of the key can be used to reconstruct the entire key. In this case the 'key' is the whole document, and I'm betting they use a different sharing scheme than ones already used for cryptography.

--
Need a Python, C++, Unix, Linux develop
Re:Flow Control by kmacleod · 2001-12-13 02:42 · Score: 2, Interesting

I think you missed the point. Sure, you need only to sip from the fire hose to finally quench your thirst, but that wastes a lot of water.

Where TCP acks and controls flow on every single packet, which also must be received by the sender in order, a protocol like this can occasionally send back a report on what kind and how many packets it is receiving (fragmented, packet loss due to bandwith, packet loss due to congestion).
Re:Flow Control by bpowell423 · 2001-12-13 02:43 · Score: 2

It looks to me like they don't care if packets get dropped. The sender just keeps jamming stuff down the pipe until the receiver gets enough. Yeah, seems to me they need some kind of flow control, and I definitely think it'd be a bandwidth hog.
Re:Flow Control by Zaak · 2001-12-13 03:07 · Score: 1

Flow control is great and everything, but the sort of transmission the article is referring to is only really useful for multicast. (It's too bad they didn't actually mention that...)
The problem with using flow control for multicast is that if your transmission has 5000 recipients, and they are all sending you flow control messages, you could easily get more upstream data coming at you than you are trying to send.
This sort of protocol needs to have an anti-multicast flow control method. In other words, the routers have to gather flow control messages and send a digest of them to their upstream router. That would enable the routers themselves to understand what's going on and participate in controlling congestion. And incidentally, the source of the transmission would only get as much flow control information as if it were talking to a single recipient.
Re:Flow Control by Anonymous Coward · 2001-12-13 04:54 · Score: 0

And also you can also ask the sender to send more algorithms if the sender times out.
Re:Flow Control by TurboDog99 · 2001-12-13 07:48 · Score: 1

I've thrown around the possibility of doing something like this for years. It's really sad that some company gets to "own" a method that many people have thought of just because they were the first with the money and resources to go through the patent process. That any data could be represented by what may be an impossibly complex equation isn't news. Also, I seem to remember part of my Calculus II class describing a process called Taylor approximations if my memory serves me right that described approximating an extremely complex equation with a sum of less complex equations. The idea has been around in the math world for years. A few companies using expensive satellite links or something might use this, but I don't really see the use. The Internet as a whole would be much faster if the RIAA didn't screw up every distributed file sharing concept that comes around. A lot less bandwidth would be used if you could, for example, download a Linux kernel from the guy next door who just downloaded it instead of everyone downloading from the same place because people are too lazy to use mirrors. Something like Freenet with built-in digital signature checking may eventually be promising for this sort of thing if it doesn't become illegal. If you want to talk about speeding up the system as a whole, put the data where it is most needed.
Re:Flow Control by Anonymous Coward · 2001-12-26 06:50 · Score: 0

The precious little fairy! Come over here you dixie cup!

Flashget by xantho · 2001-12-13 02:00 · Score: 1

Sounds like flashget/jetcar to me. It's been available for quite some time. Tell that to the USPTO!

Compression by heikkile · 2001-12-13 02:01 · Score: 1, Funny

So, someone has invented a data compression technique, and applied it over a communication channel. The only original thing in the article was the clever marketing ploy to describe this old technique as something new and wonderful...

--

In Murphy We Turst

Re:Compression by O2n · 2001-12-13 02:14 · Score: 1

Yep, that seems to be the case.
That, or someone didn't get it (the author? the marketing guy?)

The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.

This simply doesn't work. If you have something already compressed (no redundancy) - let's say a .zip file or a .jpeg picture, there is no "set of mathematical equations" that considerably reduce the data size. Note that JPEG being a lossy algorithm, it can achieve higer compression rates than non-lossy algorithms (in theory). And you can't talk about lossy compression in the same phrase as data backup. :)

They may have designed something that speeds up transfers, that's not relying on the exact packet sequence etc. - but it's not spelled out in the article.
Re:Compression by M100 · 2001-12-13 02:24 · Score: 1, Redundant

No - it's not compression. You actually need to send more data than the original file - but the receiver has to receive approx the same amount of data as the original file (about 5% if I remember correctly).
Think about simultaneous equations. If I have 2 unknowns I can solve it if I have two equations. Now imagine that I send you 4 equations, each with the two variables. You only need to receive 2 of the equations to be able to reconstruct the original two pieces of data.
Re:Compression by schon · 2001-12-13 02:43 · Score: 1

OK, so the sender has to send more data than the original file, the receiver has to receive the same amount as the original file..

tell me again how this makes the file transfer faster?

If you're sending more data, you're using more bandwidth - why not just use TCP?
Re:Compression by hyoo · 2001-12-13 03:15 · Score: 2

All the data doesnt need to come from the same server. I suppose it is like load balancing.

If you have 20kbps to mirror A, and 80kbps to mirror B. Server A and B start spamming you with packets of whatever you are downloading. Then 20% of the total packets you recieve are from A, and 80% of the total are from B. Given all the partial packets you got, you can now reconstruct the porn DivX. This would be faster then getting 100% of the packets from B which would take 25% longer.
Re:Compression by Carnivore · 2001-12-13 04:11 · Score: 1

The Big Point here is that they're getting away from TCP/IP's huge overhead. TCP makes a whole lot of sense if you're DARPA and you want to make a nuke-proof network for keeping the country running, but we don't need all of that bouncing SYN/ACK traffic. This method allows the target PC to say at some point, "okay. I got enough. I can make the file now" instead of replying about every packet.
Re:Compression by Eil · 2001-12-13 08:15 · Score: 2

No kidding. I read the entire article and didn't see just where their innovation lies... behind all the tech jargon and trademarked methods, it just sounds like fancy compression to me. I don't really understand the math behind this, but from what other people are saying, there doesn't seem to be much that hasn't been thought of before.

Something definitely smells fishy when the CEO is saying that FedEx is more reliable than FTP... I've downloaded more 20+MB Linux kernels than I can even count, and not one of the resulting files has ever had a CRC error. I have, however, known many people to have problems with packages they've shipped or recieved via FedEx.

I think the reason chip designers would FedEx their designs is more for security reasons (certified mail and insurance, etc) than reliability ones.

This article just reeked of boneheadedness, IMHO.
Re:Compression by Anonymous Coward · 2001-12-13 09:13 · Score: 0

It's all in there, you're just apparantly to dense to understand that it's designed to eliminate the problem of dropped packets, it's not a compression scheme.
Re:Compression by Deflatamouse! · 2001-12-13 17:44 · Score: 1

What you described works for any method of file transfer. Even manual file transfer i.e. Person B types 4 times faster than person A. Therefore, after some time, when the file is copied, 80% of it was typed in by person B, and the other 20% was typed by person A. Nothing enlightening... makes me wonder why this post got a score of 2.

This 'new' scheme of transfering files is better described by Carnivore in the post below.

An open solution? by darrint · 2001-12-13 02:01 · Score: 1

I wonder if it's possible to duplicate this with an open solution. If this is really as revolutionary as they say then they've earned their patents. Could free/open hackers can come up with something that delivers the same results but is unencumbered?

Kinda like IFS? by pointym5 · 2001-12-13 02:01 · Score: 2, Interesting

I mean it's not for image compression specifically, but it definitely reminds me of IFS image compression in some ways. I'll bet that compression is very time consuming, but that's fine if you're warehousing data. I wonder if the clients are pre-loaded with a body of parameterized functions, so that the server just sends information describing what functions to run and what the parameters are. I guess if it's all based on polynomials all it needs to send are vectors of constants.

Neat idea. Patents: here and here.

heh.. by Xzzy · 2001-12-13 02:02 · Score: 5, Insightful

> These files routinely are mailed on tape rather
> than transmitted electronically. "FedEx is a
> hell of a lot more reliable than FTP when
> you're running 20 Mbytes,"

Having worked in the industry they mention, I'd hazard that they don't use ftp more because of the illusion of security than anything else. People in the EDA world (which is where I worked, and has a close relationship with chip manufacturers) are immensely paranoid about people getting ahold of their chip designs, because if someone steals that.. you not only lose your next chip, you enable someone else to make it for you.

These people just don't trust firewalls and ftp yet, but they do trust putting a tape in an envelope and snail mailing it. At the very least it makes someone liable if the letter gets stolen, which you can't do with electronic transfers..

At any rate, ftp is plenty reliable for transfering 20mb files.. I do it every time a new game demo comes out. :P Maybe they meant 20gb. Cuz I've seen chip designs + noise analysis + whatever take dozens of gigs.

Re:heh.. by swb · 2001-12-13 02:13 · Score: 2

These people just don't trust firewalls and ftp yet, but they do trust putting a tape in an envelope and snail mailing it.

I've heard this said about the diamond business and the postal service. Diamond couriers, who are carrying just diamonds, can be tracked and robbed easily. Once a package enters the postal stream its nearly impossible to steal that specific package.

I dunno if its really true or not, but it has a certain counterintuitive logic that makes it believable.
Re:heh.. by ethereal · 2001-12-13 02:30 · Score: 1

Yeah, I thought that quote was nuts too. I can't remember the last time I had a 20MB ftp'd file that turned out to be corrupt - the reliability of FTP is not in question at all.
The security aspect makes a little more sense, but I can't believe that they're willing to trust the minimum-wage guys in the mail room more than a strongly-encrypted electronic file transfer. And if the USPS starts irradiating the mail, I think he'll see the reliability of mailing a tape go down very quickly :)

--
Your right to not believe: Americans United for Separation of Church and
Re:heh.. by Quixote · 2001-12-13 03:38 · Score: 1

I recall someone from the old days saying "Never underestimate the bandwidth of a truckload of tapes" :-)
Re:heh.. by Quixote · 2001-12-13 03:45 · Score: 1

Couldn't you get around the TCP ack problem by increasing the size of the sliding window? Alternately, couldn't you just tack on a "sequence" header at the beginning of the UDP packets, and periodically have the client send a list of packets that it had not received so far (from the sequence)?

I'm sure there are simpler solutions to the problem than what this company is peddling.

BTW: FEC has been used by NASA for decades. How else do you think we get data from Mars, Jupiter, Saturn, Uranus...........
Re:heh.. by regen · 2001-12-13 03:57 · Score: 4, Interesting

This is really funny. I used to work for the NYSE (Stock Exchange) and it wasn't uncommon to transmit 10GB of data over night via FTP to a single brokerage. The data that they were transmitting was at least as valuable as chip designs. It would allow you to alter trade position after trades had been transacted but before they cleared. (e.g. cancel a bad transaction or increase the amount of a good transaction)

--
The Economics of Website Security
Re:heh.. by Anonymous Coward · 2001-12-13 04:18 · Score: 0

These people just don't trust firewalls and ftp yet, but they do trust putting a tape in an envelope and snail mailing it. At the very least it makes someone liable if the letter gets stolen, which you can't do with electronic transfers..

At any rate, ftp is plenty reliable for transfering 20mb files.. I do it every time a new game demo comes out. :P Maybe they meant 20gb. Cuz I've seen chip designs + noise analysis + whatever take dozens of gigs.

Why don't they transfer the files over HTTPS?
Re:heh.. by hardburn · 2001-12-13 04:38 · Score: 2

Please tell me that data was encrypted first.

--
Not a typewriter
Re:heh.. by austad · 2001-12-13 04:46 · Score: 2

Why don't they transfer the files over HTTPS?

That's not secure either. SSL sessions can be intercepted. Just take a look at ettercap, search for it on freshmeat.net. After playing with this on your bridged DSL connection a little bit, you'll realize that SSL sucks.

Strong encryption via PGP or some other strong method would be the way to go.

--
Need Free Juniper/NetScreen Support? JuniperForum
Re:heh.. by MeerCat · 2001-12-13 04:51 · Score: 3, Informative

TCP/IP adds a 16-bit checksum to the packets. This will generally detect an error burst of 15 bits or less, if the data is uniformly distributed then this will accept a corrupt segment with probability 1 / (2^16-1). [Snader p70]. This was designed to catch bugs in routers etc. (which may write the wrong data when forwarding packets) rather than catch all data corruption on the wire.

Depending on how much noise goes undetected at your physical layers, you should expect a TCP session to pass thru incorrect data about 1 in 10^9 to 10^12 bytes passed (thats the metric I use) - and if this is unacceptable then your application layer should check and/or correct data itself, bearing in mind the end-to-end argument for system design.

T

--
I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best
Re:heh.. by aengblom · 2001-12-13 05:00 · Score: 1

It's new hot tech solution actually

Security through Obscrutitty ;-)

--

So close and yet so far from the world's perfect ID number
Re:heh.. by Namarrgon · 2001-12-13 05:12 · Score: 2

Even Zmodem used a 32 bit CRC. Hell, why not just zip the thing?

--
Why would anyone engrave "Elbereth"?
Re:heh.. by Anonymous Coward · 2001-12-13 05:12 · Score: 0

Yes, I'm sure it was encrypted... using ROT0.
Re:heh.. by Anonymous Coward · 2001-12-13 07:15 · Score: 0

It seems like increasing the amount of a good transaction would be fairly difficult, but cancelling a bad transaction could be easy. A (D)DOS attack could easily make it impossible to get the whole 10GB across overnight. Could delaying the transaction be exploited for cash? Almost certainly. Maybe someone needs to start a script kiddie mutual fund to exploit this. :)
Re:heh.. by jeeryg_flashaccess · 2001-12-13 07:53 · Score: 1

I believe that showed up the other day in the story about Cat5 cable being used for guitars. Somebody was describing speed versus bandwidth. Heh.

--
Life is like pants... fit in or you don't fit in.
Re:heh.. by Jobe_br · 2001-12-13 08:33 · Score: 2

Here's the link in case you want to try this out:
http://freshmeat.net/projects/ettercap/"
Re:heh.. by Thomas+Charron · 2001-12-13 08:44 · Score: 3, Interesting

Data transfered to variouse financial routines are routinely transmitted using insecure protocols in an unencrypted format. I've wored in at least 2 positions, one in a creditcard clearinghouse, another at an ECN. This data goes over unsecured channels all the time.

--
-- I'm the root of all that's evil, but you can call me cookie..
Re:heh.. by cabinboy · 2001-12-13 08:53 · Score: 1

That's actually legal? How could they ever lose with that kind of power?
Re:heh.. by Snover · 2001-12-13 16:45 · Score: 1

Funny, I could've sworn that FTPs were resumable, and that sFTP could be used to secure FTP channels. Guess I was wrong. Or maybe these guys are just full of shit. Hmm...

--

[insert witty comment here]

doesn't leave a lot of room for error by baronben · 2001-12-13 02:02 · Score: 1

According to the artical, the technology needs exactly the right kind of equation (or what ever this technology uses to get information) according to the repersentive quoted in this artical, if you got 98% of the packets, you don't have the file. I supose this means theres a large chance that network conditions can completly mess up a download, say interference on a router somewere in Kalamazoo, or even on local ethernet line. Not sure if this is a big thing or not, but who knows.

--
Sleep is for the weak!

DAP? by BIGJIMSLATE · 2001-12-13 02:02 · Score: 2

"User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name."

Uh...doesn't something like Download Accelerator Plus (yeah yeah, I know its a hive of spyware) already do that (downloads from multiple locations only to recombine the file later)?

yeah i remember... by jthm · 2001-12-13 02:02 · Score: 1

User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.

the network is called usenet and the company was just broken up by the government.

--
nothing excels in every environment

XOLOX is the name by arnwald · 2001-12-13 02:03 · Score: 1, Interesting

The program was called xolox,
I know the developper personally and he is very disappointed about the corporate feedback he got.

People loved it, corporations didnt, so he shut down his site and with it Xolox ( unless you have a hacked version of course ;)

Cheers.

--
My other sig is Funny.

Re:XOLOX is the name by kz45 · 2001-12-13 07:18 · Score: 0

The program was called xolox,
I know the developper personally and he is very disappointed about the corporate feedback he got

This is NOT the name of the program you were trying to remember. This is a shameless plug for a dead gnutella client (im also tired of seeing plugs like this on the zeropaid boards).

There have been man more programs that could do this very thing, before xolox even started.

Download accelerator, limewire, bearshare, kazaa....just to name a few.

I think I know how this works by David+H · 2001-12-13 02:03 · Score: 1, Redundant

It looks like they just use UDP to "send" the original data and then follow it up with parity information until the "receiving" client gets enough parity data to reconstruct any missing original data. The parity files everyone has started using on Usenet are pretty cool, and this just sounds too similar.

Re:I think I know how this works by Anonymous Coward · 2001-12-13 05:23 · Score: 0

Do you have more info on those Usenet parity files.
I haven't heard of it and it sounds neat.
Thanks
Re:I think I know how this works by Anonymous Coward · 2001-12-13 10:20 · Score: 0

he may be talking about sfv "simple file verifier"

if i remember correctly it's a 32bit hash of the original file...

performs the same function as a checksum. nothing interesting here... move along
Re:I think I know how this works by David+H · 2001-12-14 10:05 · Score: 1

See http://sourceforge.net/projects/parchive

Think Geek Geforce 3 Add by Anonymous Coward · 2001-12-13 02:05 · Score: 0

I wish they would quit saying my video card sucks since I own a Visiontek G3 and it ownz.

not always great by SylentBobb · 2001-12-13 02:05 · Score: 1

Well, one thing I noticed with Kazaa/Morpheus was that partially downloaded files were useless. I hated trying to download a movie, the file provider going offline (just rude), and then note even being able to watch the part I had downloaded. I don't know if there was a work-around to it. I just switched to using Direct Connect.

--
SylentBobb

Re:not always great by Anonymous Coward · 2001-12-13 02:25 · Score: 0

Most of the movie-files can be watched even if you dont get the ending. you just have to have a program that simulates the ending. mplayer for linux can read em all even without the ending but 'mplayer -idx' indexes it so you can fastforward. mpg's can be watched with windows media player without fixing and divx's can be fixed with somekinda divxfix (run a search, shouldnt be hard to find), again - in windows. dont need to bother when using supreme software like mplayer for linux :).
Re:not always great by Anonymous Coward · 2001-12-13 04:02 · Score: 0

An easy work around for this (in Windows, dunno if it works in linux) is to simply rename the file. Replace the file extension with .avi or .mpg (or whatever), and it should work.

Michael, did you even read it? by Chirs · 2001-12-13 02:05 · Score: 5, Informative

Guys, this is nothing like Kazaa. Kazaa will let you download from several sources simultaneously, but only because it just requests different parts of the file from each source. At that point there are still send/ack type protocols in use.

This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples. By changing to a different domain, it is possible to send metadata from which the original information can be recreated.

I have no idea exactly how they would do it, but it doesn't sound completely impossible.

However, its nothing like Kazaa or GetRight.

Re:Michael, did you even read it? by no_such_user · 2001-12-13 03:01 · Score: 3, Informative

IANADP (i am not a dsp programmer...), but if I remember correctly, the process of the Fourier transform doesn't reduce data. The transform brings our data into the frequency domain, which, in terms of audio and video data, is where we do all of the tricks to eliminate what we've learned our "ears" can't hear or "eyes" can't see.
Re:Michael, did you even read it? by Merlin42 · 2001-12-13 03:15 · Score: 1

Actually if you read to the end of the article it specifically metions XOR ... so this is realy just a network aware RAID5(sort of). ie each packet is a 'hard drive' in a RAID array and if you lose a hard drive you can reconstruct the original. Now RAID arrays are usually set up to only survive if a single drive goes out at a time, but it is mathematically possible to set it up so that given M drives N (M>N) can be lost and still be able to reconstruct the original.
Now this sounds like they are being 'gready' in a 'netiquette' sense. If you ignore TCP flow control you can seriously screw a network, but in the process you can get some insane levels of throughput.
I remember reading a paper a while back (forget where) where a researcher exploited some problems in the Windows TCP implementation (it would send out a new packet for every recieved ACK, so he ACK'ed each byte ...) to download IE in a matter of seconds, and kill the campus backbone at the same time.

--
Thoughts on tech, Software Engineering, and stuff
Re:Michael, did you even read it? by nobbis · 2001-12-13 11:23 · Score: 1

How much information is needed to communicate, say, a sine wave (at a given resolution) assuming we discretely sample the wave? Now, how much information is needed to communicate that same wave assuming we're sending Fourier coefficients.

Notice the difference in message length?
Re:Michael, did you even read it? by John+Sullivan · 2001-12-14 02:08 · Score: 1

Notice the difference in message length?

For which message, exactly?

The information bandwidth of a time-domain channel at a given bitrate is the same as the bandwidth of a frequency-domain channel at the same bitrate. IF you're saturating one channel, then doing a fourier or inverse fourier transform will have no effect.

In practice, many messages (such as English text, photographic images, recorded sound) have a convenient representation which is highly redundant. Their true information content is much lower than the theoretical maximum bandwidth of the channel used to transmit them, in that particular encoding.

The reason why converting to the frequency domain works so well for some applications, is that it is currently easier for us to separate the 'signal' from the 'padding' in the frequency domain, then lower the channel bitrate accordingly to just above the signal bandwidth. This can be done with many real-world signals, and is in theory lossless.

In practice far bigger savings in bandwidth can be sometimes made by lowering the channel bandwidth to below that of the signal. Often, features of the signal which make little subjective difference to a human observer, have disproportionately high information content. By accepting a small degrading of the signal, a huge saving in bandwidth can be made.

This does not apply in general though. Signals which naturally have very high entropy (as in already close to the maximum bandwidth of the transmission channel), or that have already been compressed by some other means, or that simply has a structure which doesn't change energy distribution in a suitable way when mapping between domains, will benefit less or not at all from such mapping.

--
This is my World Wide Web of Whatever

Morpheus does this... by equalize · 2001-12-13 02:05 · Score: 1

When downloading something large (probably everything, just more noticeable when the file is large) Morpheus connects to different users with the same file.

Swarmcast ? by wintahmoot · 2001-12-13 02:06 · Score: 1

You must surely mean OpenCola's Swarmcast. hiro

--
Martin May

vector graphics by jas79 · 2001-12-13 02:06 · Score: 0, Offtopic

This sounds a lot like how vector graphics work. They don't transmit every pixel,but only send the coordinates and the instructions how to draw the image.

Somehow they managed to do the same for applications. maybe they only send the sourcecode and compile the code on location

Swarmcast by Anonymous Coward · 2001-12-13 02:06 · Score: 0

The company which did that was openCola with a OpenSource product called Swarmcast, but as far as I know they have abandoned the project and cut of the developers off their payroll. However they are still going on at sourceforge.

I know few of the developers and one of them has started a service company, selling services for swarmcast called Onion Networks

AK

Re:Swarmcast by anticypher · 2001-12-13 04:10 · Score: 2

shouldn't be paying $150k for these types of technologies.

You haven't looked at the outrageous prices of Cisco equipment lately, have you? :-)

Assuming these boxes are for accelerating multicast applications and preserving WAN bandwidth, then US$70,000 per box sounds competitive with Cisco. Put a large one of these boxes at the headend of a multicast, and one box for each LAN where there are a number of receiving clients, and there could be considerable savings in WAN usage.

/. readers don't realize how expensive a WAN link costs, and how those costs jump in large multiples when a PHB demands more bandwidth for his pet project. Especially outside of the US where international WAN links cost upwards of $10,000/month for 2Mbit/second, and to add another 2Mbit/sec will cost another $10,000 plus a long wait for turn-up. If the PHboss absolutely has to roll out a multicast application immediately, I'd throw a handful of these boxes on the network, and not bother with buying more bandwidth for a while.

the AC

--
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
Re:Swarmcast by Hast · 2001-12-13 06:10 · Score: 1

I looked into Swarmcast this summer (really cool stuff BTW) and it's based on the same papers. Both seem to use FEC (also used by satellite comms), at least I know Swarmcast does.
However, while Swarmcast was mainly a download utility, not unlike GetRight and similar tools, this system seem to be more general. The site hints that it should be possible to use for any net transfer. (Technically that would be possible with Swarmcast as well, but it wasn't implemented.)
I think this product is more of a "TurnKey" thing. But if you are interested in it then go and look at the code for Swarmcast. It's most likely quite similar in design.
Re:Swarmcast by Birdie-PL · 2001-12-13 06:24 · Score: 1

Furthermore, if you understand some maths behind you will note that this technique is very similar to one called information dispersal.

The application should be easy. Assuming you get a file that fits into N packets, you 'magically' make (1+epsilon)*N packets and send them. If the receiver got at least N packets it can reproduce the original file.

And, if you are clever enough, you can incorporate some tagging into the scheme, so receiver is able to ask for retransmission of the part that is missing. Of course, if you use the scheme for streaming it's not big help. But if you do FTP, it is. Imagine sending 3 retransmissions for 1 GB file.

--
e-mail: karol at tls-technologies.com
www: http://www.tls-technologies.com
sig: not found
Re:Swarmcast by hnchou · 2001-12-13 06:32 · Score: 1

Just out of curiosity. It will be very helpful if you could explain what contributes to the 3-5.5% efficiecy improvement.
Re:Swarmcast by Orasis · 2001-12-13 12:29 · Score: 1

Sure. The Tornado codes achieve good performance by requiring 3-5.5% additional data beyond the original file size in order to reconstruct the original file. The Vandermonde codes that we use require no additional data, but require a bit more scheduling to get good performance. This scheduling is no problem over a point-to-point link.

Swarmcast by Webmonger · 2001-12-13 02:08 · Score: 4, Informative

I believe Michael's talking about OpenCola's Swarmcast.

Cool by athmanb · 2001-12-13 02:09 · Score: 1, Offtopic

Now you just need to combine that with the revolutionary algorithm to compress any data to one bit and power your computer by cold fusion, and you got one heck of a file transferring machine!

Re:Cool by biobogonics · 2001-12-13 05:16 · Score: 1

Now you just need to combine that with the revolutionary algorithm to compress any data to one bit and power your computer by cold fusion, and you got one heck of a file transferring machine!

All you need is a Turing machine with an infinitely long tape and an infinite amount of time to encode, transmit and decode the message.

Seriously, I expect this to make a big splash like the fractal based image compression technique which has now faded into obscurity.

Doesn't e-Donkey by cruelshoes · 2001-12-13 02:09 · Score: 1

User would go to download a game demo or something, receive pieces from several different places, and knit them together?

Doesn't e-Donkey grab from all kinds of different sources and then assemble the file?

By George, I believe it does.

Name of company and product by Omnifarious · 2001-12-13 02:10 · Score: 4, Informative

The company's name is OpenCola and the name of the product was SwarmCast. The guy who did SwarmCast, Justin Chapewske, is now at a company he started named Onion Networks. OpenCola appears to have completely abandon its original Open Source approach to their software.

Apparently, Justin has taken the GPL portions of Swarmcast and is improving them at Onion Networks.

--
Need a Python, C++, Unix, Linux develop

Re:Name of company and product by Anonymous Coward · 2001-12-13 03:44 · Score: 0

SwarmCast isn't such a unique idea.
edonkey does it; fasttrack does it; limewire does it now; and soon more apps will do it. It just makes distributed sense.

Debian by Some+guy+named+Chris · 2001-12-13 02:11 · Score: 2

Debian does something similar with the Pseudo Image Kit.

It gets all the parts of the install ISO cd image, from disparate sources, stitches them together, and then uses rsync to patch it to exactly make a duplicate of the original install image.

Very nifty.

mod up plz by Anonymous Coward · 2001-12-13 02:11 · Score: 0

and it doesn't actually transfer the data.

Not a new concept by KarmaBlackballed · 2001-12-13 02:12 · Score: 2, Informative

The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.

This is called compression. Everybody is doing it and it has been done before.

When you download a ZIP file, you are not downloading the content. You are downloading a mathematically transformed version of it. You then translate it back. Modems have been compressing and decrompressing on the fly since the late 1980s.

Maybe they have a better compression scheme? (Fractal based?) That would be news. Everything else is a distraction.

--

--- -- - -
Give me LIBERTY, or give me a check.

Re:Not a new concept by Omnifarious · 2001-12-13 02:26 · Score: 4, Informative

UDP drops packets. What they are saying is they can packetize things in such a way that as soon as you pick up any N packets, you get the file, no matter what. They are also implying that anything less than N packets leaves you gibberish. This is quite different from file compression. It may be related to fractal file compression, but I think it's probably more related to cryptographic key sharing schemes.

--
Need a Python, C++, Unix, Linux develop
Re:Not a new concept by tzanger · 2001-12-13 02:29 · Score: 2

Maybe they have a better compression scheme? (Fractal based?) That would be news. Everything else is a distraction.

I remember back in the good old BBS days of a program called OWS which did "fractal compression"... In reality, it deleted the file and stored the sector information of the deleted blocks in the "compressed file" -- if you moved the .ows file off the disk, you'd get "decompression" errors. :-)
Re:Not a new concept by KarmaBlackballed · 2001-12-13 02:35 · Score: 0, Troll

Have you ever spanned disks with a ZIP file? Think of each disk as containing a "packet" of compressed information.

If you are missing a disk, yes the decompression fails.

Nothing to see here. Move along.

--

--- -- - -
Give me LIBERTY, or give me a check.
Re:Not a new concept by Omnifarious · 2001-12-13 02:38 · Score: 2

It won't work if they constantly resend the same packet over and over. They need it set up so that as soon as you get any N pcakets, you can reconstruct the file. This means that each packet needs to be different from all the other packets.

So if one side sends the other 6 packets before recieving the 'stop' request, and only 5 are needed to send the file, it doesn't matter which 5 of the 6 are recieved. No packets are sent from the reciever during the transfer, only at the beginning and end.

And, yes, it is a bandwidth hog, but for reasons that are rather different from the ones you imagine. It has no provision for congestion control, which means that it will keep blasting away with UDP packets at a congested router (keeping it congested) rather than backing off until the congestion ha cleared.

--
Need a Python, C++, Unix, Linux develop
Re:Not a new concept by Omnifarious · 2001-12-13 02:40 · Score: 0, Flamebait

I'm beginning to think you are a purposeful idiot who is merely trying to post something that has the appearance of being intelligent (but really required no thought at all) in order to get karma and the +2 bonus to be a jerk with later.

--
Need a Python, C++, Unix, Linux develop
Re:Not a new concept by KarmaBlackballed · 2001-12-13 02:50 · Score: 0, Troll

Transporter Fountain creates not equations but hundreds of millions of "symbols" which can be used to reconstruct the data. The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols.

The power of this "press release" is that it fools readers like yourself into believing there is more there than the sum of the parts.

Read it dude. Think about it. Think huffman codes. Think fractal compression. Think run-length encoding. And yes, think disk spanning.

Don't be a j*erk. More important, don't be a pawn.

--

--- -- - -
Give me LIBERTY, or give me a check.
Re:Not a new concept by friscolr · 2001-12-13 02:55 · Score: 1

This sounds like something i read in either Disappearing Cryptography or The Data Compression Book (was a while ago, can't remember which one - either way, both decent reads), which i didn't fully understand at the time but kind of related to RAID-5, where you need less disks to rebuild a filesystem generally composed of a greater number of disks.

--
-f
www.blackant.net
Re:Not a new concept by Anonymous Coward · 2001-12-13 02:58 · Score: 0

I think they are simply using a way to minimize redundancy in the data. So if i want to transmit X=10 and Y=20, I can send :

1 X + 0 Y = 10 (sending 1,0,10)
0 X + 1 Y = 20 (sending 0,1,20)
1 X + 1 Y = 30 (sending 1,1,30)
2 X + 1 Y = 40 (sending 2,1,40)
etc

This way, once 2 packets arrive, I can solve the eq and determine X,Y. There will be surely some algorith which keeps the redunancy minimal.

Klaas Naaijkens
Next2us.com
Re:Not a new concept by kasparov · 2001-12-13 03:06 · Score: 1

So instead of sending x=20 and y=20 (two theoretical packets) they send:
x+0y=10 and 0x=y=20 (two larger theoretical packets)
and this magically saves bandwidth how?

--
There's no place I can be, since I found Serenity.
Re:Not a new concept by Anonymous Coward · 2001-12-13 03:10 · Score: 0

x and y will be parameters calculated with some transform/compression

There will be two parts:
-compression/fourrier transform/...
-data transmission

UDP datatransmission will no save BW, for sure
And X and Y (Z,....) can be very large, so the coeffients dont matter much.

Klaas Naaijkens
Next2us.com
Re:Not a new concept by GrEp · 2001-12-13 03:22 · Score: 2

A tech paper on their "tornado codes". Also, the link to their tech papers website.

Didn't have much chance to look over the algorithms, but we really should have compression used more frequently in network transport.

--

bash-2.04$
bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
Re:Not a new concept by Anonymous Coward · 2001-12-13 03:27 · Score: 0

No, it is not compression-- nowhere do they state that the amount of transmitted data is any smaller than the original amount.

Instead, it's redundancy-- send more than necessary so that it becomes acceptable to lose some data in transit. Check the Tornado paper for details.
Re:Not a new concept by pclminion · 2001-12-13 07:15 · Score: 1

Actually it's conceptually related to holographic imaging. In a hologram, information is evenly distributed across the film, and any finite piece of the film can be used to reconstruct the original image. Of course, the more pieces you have, the more accurate the reconstruction is.
This method is somewhat analogous. The information in the file to be transmitted is spread uniformly across a number of packets, with redundancy. Once you have acquired enough packets, you can accurately reconstruct the original file. The difference between this and holography is that in holography you can reconstruct a useable image from very few "packets": it will just be fuzzy and less distinct than if you had all the packets. In this method, you must have a certain minimum number of packets to reconstruct the file.

Re:OOPS, name of person slightly wrong by Omnifarious · 2001-12-13 02:13 · Score: 3, Informative

Oops, make that Justin Chapweske. That's what I get for typing out an odd name from memory. :-)

--
Need a Python, C++, Unix, Linux develop

Yea, Right. by Anonymous Coward · 2001-12-13 02:13 · Score: 1, Insightful

"FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes," said Charlie Oppenheimer

Who does this guy think he is kidding?? We regulary FTP AutoCad files of 100 Megs and ISO images of 500 Megs with no issues what so ever.

Sure it might take an hour or or so to complete but, that beats the hell out of FedEx and it's a lot cheaper too. This guy's been smoking a few too many of his own marketing brochures.

Re:Yea, Right. by boltar · 2001-12-13 02:29 · Score: 0

Probably a typo. I suspect he meant a 20 Gigs. At least I hope he did!
Re:Yea, Right. by Anonymous Coward · 2001-12-13 02:45 · Score: 0

And I take it they have never heard of adjusting your TCP window size. I'm running a window size of 512KB at home. Makes a big difference.

Fountain by Anonymous Coward · 2001-12-13 02:14 · Score: 0

You'd think if they were going to call their product Fountain, and try to name it something out of Star Trek, they would at least call it "Particle Fountain" which actually appeared in at least 1 episode of TNG, I think.

Oh, wait, the particle fountain blew up and didn't work so well...

Nevermind ;)

Glenn

One of two by Looke · 2001-12-13 02:14 · Score: 1

This is one of two:

Yet another compression algorithm
A perpetuum mobile

Guess what?

Re:One of two by Anonymous Coward · 2001-12-13 09:12 · Score: 0

Wrong answer. Number one is pretty close it's a transformation algorithm, but it doesn't guarantee protection.

"Game Demo" by SkOink · 2001-12-13 02:15 · Score: 1

Well, Morpheus will let you download gigs of these so-called "Game Demoz" for free from multiple sources at the same time!

--
---- I'll take you in a Hunt deathmatch any day.

UDP? Not always good by darrad · 2001-12-13 02:15 · Score: 1

From what I have seen in the past, using UDP is not always a good thing. Many of the major backbone providers, and a lot of ISP's block UDP traffic at different times for many different reasons(Smurf attacks, DoS). This can lead to several services being shutdown.

The idea itself sounds good. You more or less send a description of the file in a mathematical equation. If the equation itself is smaller in size than the file, great.

Re:UDP? Not always good by Anonymous Coward · 2001-12-13 02:18 · Score: 0

Maybe your ISP blocks UDP but, certainly none of the "major backbone providers". If they did, you wouldn't even be able to resolve a DNS name (UDP 53).

The internet would cease to function if UDP were blocked.
Re:UDP? Not always good by jidar · 2001-12-13 06:25 · Score: 2

Uh.. you mean ICMP I think. Widespread blocking of UDP isn't something I've noticed. That would suck considering 90% of the games people are playing online are UDP.

--
Sigs are awesome huh?
Re:UDP? Not always good by darrad · 2001-12-13 06:46 · Score: 1

I think you are right. Massive brain cloud here or something. Ah well, I stand corrected.

Basically by glowingspleen · 2001-12-13 02:16 · Score: 2

For anyone that just wants the jist of the article:

"The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. "

So basically, it's not much more than UDP with a single reply telling the server to stop transmitting.

Not bad, but you better have some good timeouts worked into this thing. UDP by definition is a non-replying "if it gets dropped who cares?" protocol. If the receiver's connection were to go down, wouldn't the server just get flooding all the in-between routers with packets for awhile? That's not good for traffic congestion.

--

------
Let me give you the lowdown

Bad Netizens? by mattrwilliams · 2001-12-13 02:16 · Score: 1

Isn't one of TCP's purposes to throttle connections when loss (=contention) in the core starts to affect a stream? This is a method by which multiple users can share the same public network without adversely affecting one another. This technology looks like it is working around this problem by adding redundancy to the orignal data and then flooding the network, ignoring any indications of contention. This smells pretty selfish to me and could cause problems to the public internet if this technology ever takes off in large enough numbers.

--
The generation of random numbers is too important to leave to chance

Just send numbered UDF Packats by seanmceligot · 2001-12-13 02:17 · Score: 2, Interesting

This could be done easily without the proprietary algorithms. Just send update packets with a header in each on stating that it is packet number N and there are X total packets. Then, request missing packets when you get towards the end, and put them all together in order when you get them all.

Somewhat unrelated --- Does anyone else miss Z-Modem. We need a zmodem like program for that works over telnet so we don't have to open a separate FTP session. In the BBS days, you just typed rz filename and it came to you.

Re:Just send numbered UDF Packats by Mr.+Slippery · 2001-12-13 02:31 · Score: 1

Does anyone else miss Z-Modem. We need a zmodem like program for that works over telnet so we don't have to open a separate FTP session.

I have seen telnet clients with x/y/zmodem or kermit built in. Useful in certain help-I'm-trapped-behind-a-dumb-firewall situations.

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
Re:Just send numbered UDF Packats by biobogonics · 2001-12-13 05:30 · Score: 1

Somewhat unrelated --- Does anyone else miss Z-Modem.

The reason that Zmodem was faster than other types of file transfer was that it eliminated the handshaking and acknowledgement overhead of earlier protocols like xmodem, which operated on 128 byte CP/M style "records". Instead it sent the file in one chunk with ack at the end. You had to have compatible hardware error correction on both sending and receiving modems. In fact, this often involved wading through various modem setup strings and s-register values. If you lost one bit or dropped carrier, then your file download was history. Since TCP/IP is packet switched rather than involving transmission of individual characters, this changes things quite a bit.
Re:Just send numbered UDF Packats by Turing+Machine · 2001-12-13 06:31 · Score: 1

Instead it (ZMODEM) sent the file in one chunk with ack at the end. You had to have compatible hardware error correction on both sending and receiving modems.....If you lost one bit or dropped carrier, then your file download was history.

You are mistaken. ZMODEM not only recovered from transmission errors gracefully, it would even recover from a dropped carrier, resuming the transmission at the point it aborted.

I think you are thinking of YMODEM-G.

my take by bpowell423 · 2001-12-13 02:17 · Score: 5, Informative

There have been lots of comments along the lines of, "this is just a novel compression/transmittion scheme". In a way, that looks to be true, but here's my take.

Judging from this:

The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.

It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. Apparently there are multiple solutions to the equations that will arrive at the same answer, so when the receiving end has received enough symbols to make it works it says, "stop sending already!" Apparently they're getting their speed because A) things don't have to go in any order (that's how the 'net is supposed to work, right?) and B) Alice and Bob don't have to keep up this conversation: Alice: Hey, Bob, can you send me X? Bob: Okay, are you ready? A: Yes, Go ahead? B: Okay, here it comes. A: I'm waiting. B: Here's the first packet." A: What? That packet didn't make it over. B: Okay, here it is again. A: Okay, I got that packet. B: Good. A: Okay, I'm ready for the second packet. B: Okay, here's the second packet.

Okay, I had too much fun with the Alice and Bob conversation there. Anyway, it looks like there scheme is compressing things in the form of their equations, and then just sending them in a burst until the receiver is happy.

Sounds like it might work, but it'll generate a ton of network traffic, I'd bet!

Re:my take by satanami69 · 2001-12-13 02:29 · Score: 1

Better than that, why not just wait until we get a working version of Normality
Of course, that may be a very long wait, but the compression would rule!

--
I really hate Dan Patrick.
Re: my take by Unfallen · 2001-12-13 03:21 · Score: 1

There have been lots of comments along the lines of, "this is just a novel compression/transmi[ss]ion scheme"

Well yes, as would almost any advance in data transmission be. The idea is to get information from one place to another as fast as possible, so reducing the size makes it faster...

For me, the important difference to note here is that:

"We send recipes, not pieces of content..."

So with "symbols", the sender is informing the receiver how to create the file. The only similar paradigm I can think of right now is the DCT techniques used in JPEGs, whereby the pixels are approximated and then described via a combination of mathematical waves. But I'm very tired, so there are probably loads more...

Without knowing the intimate details, it's hard to say how much of an improvement over current FTP methods this yields. I suspect that the gain would depend on not only the size of the file, but also the type of the file - the same two factors that affect just about any other form of compression (Lempel-Zif, to be least complex). As in the article:

Transporter Fountain was created to handle large, long-distance file transfers

(my emphasis) - the situations described here, such as back-up information (long streams of continuous data) would naturally be suited a lot better to this way.

Unfortunately, the article doesn't go into more complex details about how the symbols are sent (a UDP stream is implied, and makes sense if you only need a subset of the generated symbols to regenerate the file). Time to dig...
Re:my take by mr3038 · 2001-12-13 05:30 · Score: 3, Interesting

It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. ...don't have to keep up this conversation: [long conversation removed]
It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss. How about simply compressing the data and using bigger packets to transmit it? If you can use big enough window while sending data you can push all the data to the network in the beginning. Conversation comes to A) Here it comes A) Done [64 packets, 125MB] B) Okay, listening B) Resend 2,7 and 25 A) Done [3 packets, 6MB] B) OK. Note that A starts sending before getting reply from B. In fact, with fast long-distance connection it could be that A gets to the end before B getting "Here it comes".
I think if we want to speed up file transfer we need an API to tell OS that we're going to send lots of data so make it big packets or the opposite. Currently we just open socket connection to destination and start write()ing. OS has no way to guess whether or not we're going to write 100 or 10e8 bytes. We need a way to tell OS that the data we're sending isn't worth a dime before it's all done so make it big packets to minimize bandwidth wasted to TCP control traffic.
You can opt to waste bandwidth to reduce perceived latency and that's what I think is done here. A sends file twice and in a case some packets were lost the sent copy would be used to fill in missing parts. A has sent missing packet before B had known it's missing it. Yeah, A wasted half the bandwidth for the redundant data that got correctly to the destination at the first time but we aren't interested in that. The key here is to use UDP so that lost packets are really lost instead of automatically resend. This kind of setup increases overall throughput only if latency is the only problem in your network. Perhaps this is needed in the future because increasing bandwidth is easy - not necessarily cheap - but the light crawling inside fiber is so damn slow!

--
_________________________
Spelling and grammar mistakes left as an exercise for the reader.
Re:my take by MrBoring · 2001-12-13 07:33 · Score: 1

I must be missing something. If you eliminate ACK packets, and reduce the amount of regular packets to "symbol" based packets, how can that possibly generate more traffic? This is an honest question, I'm not trying to insult anyone.
Re:my take by Courageous · 2001-12-13 07:34 · Score: 2

It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss.

You are almost certainly correct. Similar technologies that I've looked at in the past expand individual packet sizes by about 20% in order to do forward error correction and thereby save on NACK/resends. This becomes particularly valuable when latencies are long, of course, which is what you guessed the application is for.

What's not said here is how much of the perceived problem they described is really just a problem with TCP/IP windowing. When effective bandwidth drops to a few percent of the channel bandwidth, this is almost always because the TCP/IP window buffer has been overrun.

This, of course, is particularly noticeable on high bandwidth, very high latency networks of the type you see on some satellites.

C//
Re:my take by Electrum · 2001-12-13 13:53 · Score: 1

I think if we want to speed up file transfer we need an API to tell OS that we're going to send lots of data so make it big packets or the opposite. Currently we just open socket connection to destination and start write()ing. OS has no way to guess whether or not we're going to write 100 or 10e8 bytes. We need a way to tell OS that the data we're sending isn't worth a dime before it's all done so make it big packets to minimize bandwidth wasted to TCP control traffic.

Actually, that already exists. TCP/IP does this using the Nagle algorithm. The OS waits a certain amount of time, probably 250ms, before sending, to see how much data there is to be sent. If there isn't a large enough packet by the end of the waiting period, it just goes ahead and sends the data anyway. If you fill up the buffer, it sends the data immediately. Interactive applications like telnet disable Nagle, as you obviously wouldn't want to wait before sending a keystroke. If you're sending a file, you can essentially write the entire file to the socket at once. In fact, several OS's like Linux and FreeBSD have the sendfile() call, which avoids application overhead by having the kernel directly transfer data between file descriptors.
Re:my take by mr3038 · 2001-12-14 02:42 · Score: 2

write the entire file to the socket at once [...] the sendfile() call
(man sendfile, man 7 tcp) Hmmm.... TCP_CORK socket option seems interesting too. Unfortunately there also reads: "Other Unixes often implement sendfile with different semantics and prototypes. It should not be used in portable programs." and "TCP_CORK is new in 2.2". Does POSIX have any support for directing TCP/IP-stack about the content of data to optimize packet size and stuff or is there some another way to do this in portable way?

--
_________________________
Spelling and grammar mistakes left as an exercise for the reader.
Re:my take by Electrum · 2001-12-14 03:17 · Score: 1

It would be nice if sendfile() were portable, but I don't think it's that big of a deal. It's main use is for web serving. For that, you're probably on a FreeBSD box, and it works quite nicely for that. It's not difficult to put in some conditional code so that it uses sendfile if available, and conventional methods if not. Zeus does this, for example. You shouldn't need to optimize the TCP/IP stack behavior at the application layer. That completely defeats the purpose of the BSD socket model. There's basically two types of applications: interactive and data transfer. With an interactive application, you disable Nagle, because latency is more important than bandwidth and throughput. With data transfer, you just write as much data to the socket as possible, and let the kernel do it's job. sendfile() is very nice for this, because you avoid the overhead of reading the data into memory first. This won't usually be a lot, but it adds up when you are pushing a lot of data. It is just a guess, but on systems without sendfile(), you should be able to get pretty close to it's performance by mmap()'ing the file instead of reading it. If you are serious about networking programming, then I highly recommend reading W. Richard Stevens' UNIX Network Programming.

File Resume by The_Flames · 2001-12-13 02:17 · Score: 0

Anything that uses the music city protocol has the download from multiple available sources at the same time; if your downloading from the web then downloads managers like getright ect also are available to do a similar task.

--

--
The computer told me to press any key to continue,I pressed the one looking like this (|) !!OH SH*T!!

Technique: FEC (Forward Error Correction) by Anonymous Coward · 2001-12-13 02:18 · Score: 1, Insightful

These guys are implementing a Forward Error Correction mechanism for compression. It's all from research of Michael Luby at Berkeley with FEC and Tornado codes. He is a co-founder of the company. Pretty effective technology for certain applications.

Details by adam303 · 2001-12-13 02:18 · Score: 1

That article leaves alot to wonder. Is it just some data compression? I don't think there's any more to be done in that field that hasn't been discovered yet. Is it a protocol that sends alot of UDP packets and partiy packets so you can fill in the missing blanks? Then you have to deal with bandwitdth throttling, because you can't just have some machine sending out UDP as fast as it can to your smaller pipe, it will cause some DoS. Anyone have links to their patents? adam

fastest by cr@ckwhore · 2001-12-13 02:20 · Score: 4, Funny

Someday when we all have extraordinarily fast computers, we'll simply be able to send somebody an MD5 sum and the computers will be able to "crack" it back into the original file. At that point, commercial software wouldn't even have to come on a CD... just print the hash on a slip of paper and the user could type it in.

word.

--
Skiers and Riders -- http://www.snowjournal.com

Re:fastest by pointym5 · 2001-12-13 02:26 · Score: 1

No, they won't, because given any MD5 hash there are an infinitude of files that hash to it.

Now if you also send the file size, you reduce the possibility of collision.

But there's still a minor problem of iterating through the set of candidates. If you send me the MD5 hash for a 500KB file, you'll need to get cranking on computing the MD5 hash of each 4 million bit number. 2^4000000000 different MD5 hashes will take a few lifetimes of the universe to perform with any computing device bounded by time quanta.
Re:fastest by tzanger · 2001-12-13 02:35 · Score: 2

Someday when we all have extraordinarily fast computers, we'll simply be able to send somebody an MD5 sum and the computers will be able to "crack" it back into the original file.

Not familliar with hashes? You can't do it quite like that because an MD5 hash is the same for any number of datasets. Trying to un-do a hash is pretty idiotic at that level.

However, think about this scenario: Send the MD5, byte count and CRC64. Now rattle through all the possible combinations of "byte count" of data that generate the same MD5 and CRC64 and values as were sent. It's still not foolproof but you've reduced the dataset considerably. Of course, now you compare your new file to the GPG sig sent to ensure you got it right.

Perhaps as we approach quantum computing it will become more realistic to brute-force a many-hundred-megabyte file from a number of its hashes.
Re:fastest by jrockway · 2001-12-13 02:37 · Score: 1

I wrote a progarm like this :) Problem is MD5 sums don't really have enough information, nor do MD5 + length (of the result). Think about it; MD5 has 128 bits. As soon as you exceed that you'll get duplicates. And that would mean the wrong file.

--
My other car is first.
Re:fastest by gowen · 2001-12-13 02:43 · Score: 1, Funny

Not familliar with hashes?
Not familiar with comedy?

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:fastest by addaon · 2001-12-13 03:11 · Score: 1

Why would any device be bounded by time quanta? Only serial devices are bound like this... a 100-way parallel computer, for instance, can provide 100 operations per quanta. Extend this to n-way, for n large, and you can do a heck of a lot in an absurdly small amount of time; the next limit you have to look at is number of particles, assuming each particle is in a finite number of states (that is, assuming we're not doing quantum computing. Why wouldn't we be?).

--

I've had this sig for three days.
Re:fastest by anthony_dipierro · 2001-12-13 04:17 · Score: 1

No, they won't, because given any MD5 hash there are an infinitude of files that hash to it.

But how many of those files will actually compile?
Re:fastest by ralmeida · 2001-12-13 04:53 · Score: 1

Just send the digit of PI in binary where the file starts, and where it ends (assuming the digits of PI are truely random, so that every combination of digits can be found).

--
This space left intentionally blank.
Re:fastest by Galvatron · 2001-12-13 06:24 · Score: 1

Presumably an infinite number. Even if only one in a billion random strings of bits compiled, one one billionth of infinity is still infinity. Of course, there's no way to compile an infinite number of programs, so I guess we'll never know for certain (unless someone can come up with a mathematical proof one way or the other).

--
"The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
Re:fastest by ruvreve · 2001-12-13 07:09 · Score: 1

Me and my beowulf cluster are ready.
Re:fastest by anthony_dipierro · 2001-12-13 07:20 · Score: 1

Even if only one in a billion random strings of bits compiled, one one billionth of infinity is still infinity.

But there aren't an infinite number of potentially compilable bit strings, since there is a limit on the maximum file size. Besides, as was said by the original poster, we could simply send the size of the file along with the bit string. If you assume that every file is a valid gzipped tar of a directory in which one must only type "./configure; make; make install", I bet that would work as a compression algorithm. Assuming of course that you had the processing power to attempt to compile all bit strings of length X with MD5 Y, that is :).
Re:fastest by cduffy · 2001-12-13 07:58 · Score: 2

Heh... that raises an interesting question.

If the set of "runnable programs" is small enough, it would be interesting to be able to find all runnable programs of less than a given length having a given MD5 sum. (Of course, strings, images and such would complicate the situation drastically -- so let's just leave those out and use this for sending code segments only).

Of course, using a hash created explicitly to be one-way doesn't make much sense in such a situation... but just as a thought experiment it's interesting.
Re:fastest by Bronster · 2001-12-14 01:20 · Score: 2

Not familliar with hashes? You can't do it quite like that because an MD5 hash is the same for any number of datasets. Trying to un-do a hash is pretty idiotic at that level.

So - most of those cracked datasets are going to be an invalid file for the format (which we can deduce from the 3 letter filename extention - usually .tmp or something depending on your email client's handling of temporary filenames ;( ).

Any decoded files can then be checked (it's supposed to be a JPEG, does it look anything like a picture? Not it then.

Anything that passes all those filters can then be sold as art, there's gotta be some sucker out there willing to buy.

Meanwhile, I have a super fast computer which can crack MD5s into lots of art forms, and I would proceed to break into the FBI before getting shot in the head (but not before getting a blowjob from the yummy blonde (there's always one)).

dropped packets.. no problem? by ianxm · 2001-12-13 02:20 · Score: 1

The article says that they don't care if packets get dropped, as long as the right number of packets get transmitted.

Aren't the packets unique? If a packet gets dropped, how do they know which one to resend?
It doesn't make sense to me.

Re:dropped packets.. no problem? by night37 · 2001-12-13 03:07 · Score: 1

They use an analogy that goes against what they say later,

"Once you get enough [of the packets] coming in, Spock appears. If you get 98 percent of the packets, you get nothing."
Then later they say,

"The arrangement saves time because neither side cares if a packet gets dropped, thus eliminating the dialogue required by TCP and FTP. "
You have to have all of the data. That's part of the definition of a download. It looks like they're saying "If you don't get it all, you have to try again".

Or am I missing something here?
Re:dropped packets.. no problem? by Happy+Monkey · 2001-12-13 05:56 · Score: 2

There are m unique packets, but any group of n of them will decode to the file where m is less than n. So if a packet is dropped, it's OK, as long as less than m-n are dropped. Essentially, there is redundant data, and the only response that needs to be sent is the "Ive got it" response.

--
__
Do ya feel happy-go-lucky, punk?

Entry level is $70k by joshv · 2001-12-13 02:20 · Score: 2

Yeah, I don't ftp is so slow that anyone is going to pay $70k for their proprietary 'Transporter Fountains'. Seems like anyone with a little common sense and math ability could easily cobble together a software UDP based transfer protocol that has all of the properties described in the article.

The key is to build in redundancy without increasing the amount of data sent so much that you counteract the speed gains you get by using UDP.

-josh

Re:Entry level is $70k by Anonymous Coward · 2001-12-13 04:20 · Score: 0

The key is to build in redundancy without increasing the amount of data sent so much that you counteract the speed gains you get by using UDP.

Oh, is that all is required?

Can you say, "impossible"?

FTP already provides for this by autocracy · 2001-12-13 02:20 · Score: 2, Informative

Just send restart at commands to many different servers, then cat the files onto each other. This is how Dowload Accelerator does it, and Fast Track is the same theory. Programs just take all the mental work out of it.

--
SIG: HUP

Re:FTP already provides for this by Anonymous Coward · 2001-12-13 03:16 · Score: 0

You didn't read the article did you? Neither did the moderators who bumped you to a +3. Great stuff guys, keep up the good work.

There's always meta-mod.
Re:FTP already provides for this by autocracy · 2001-12-13 03:24 · Score: 2

This response was not to the article, just the others who posted info about "$PROGRAM does this!" See my other comment in this story.

--
SIG: HUP

Uhh... my shit detector just went off by tzanger · 2001-12-13 02:21 · Score: 1, Flamebait

From the article:

Meltzer recalled a job where the client had a 32-Mbit/second connection available but was getting a throughput of 0.5 Mbits/s. "It wasn't a question of mere bandwidth. They had too much turnaround," he said.

Um... if you're getting 500kbps on a 32Mbps connection your protocol stinks. 1/64th of your available bandwidth isn't FTP's fault, nor is it TCP's. Either there was a severe bottleneck somewhere between the endpoints, or the protocol was designed to minimize throughput.

More shit:

"FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes," said Charlie Oppenheimer, vice president of marketing at Digital Fountain.

They may have better bandwidth but the latency sucks. Furthermore, I've never had FTP destroy my packets. It either made it or it didn't, and it makes it 100% of the time, barring connection failure.

Sorry. I don't buy it. Yeah sending over UDP gives you less hassle than TCP but now you have to take into account all the sequencing and data transfer checks. Not terribly difficult but no rocket science, either.

Re:Uhh... my shit detector just went off by per+unit+analyzer · 2001-12-13 03:06 · Score: 1

It's possible that 500 kbps on a 32 Mbps link was ftp's and TCP's fault. Most TCP implementations in use today have default window sizes that will severely limit your throughput. And if the TCP implementation doesn't support the large window size option the max your TCP window can be is 64K. TCP Window size is the amount of data that can be sent before TCP needs to wait for an ACK. So if your window is 64K, you send 64K and wait for the ACK, send 64K and wait for the ACK, etc. This isn't really a problem over short, relatively slow links, but on any network where there is a large bandwidth*delay product this will be a problem because the link ends up idle for a good amount of time.

For example, lets look at a situation where I need to tranfer data from the east coast to the west coast and my round trip time is 70 ms. If I have a 32 Mbps link, I can send 64K in about 2 ms. So I have to wait 68 ms until I send another 64K. This gives an effective throughput of less than 1 Mbps. Throw in the slightest amount of congestion and things go even further downhill real fast.

This is a well know problem that has been addressed. Special implenetations of ftp clients and servers are available free (as in beer) to take advatage of large TCP windows. It sounds like these guys are trying to use this phenomena to sell their new compression algorithm.

And by the way, I don't buy their spin about getting "data from one place to another without actually transmitting the data." This is applying compression directly at the communications channel pure and simple. This is analogous to a (hypothetical) ftp client that gzips everything before it is put on the wire and sends it to an ftp server that automatically gunips it at the other end....

--z

--
In Soviet Russia, the Beowulf cluster imagines you!
Re:Uhh... my shit detector just went off by tzanger · 2001-12-13 03:29 · Score: 2

For example, lets look at a situation where I need to tranfer data from the east coast to the west coast and my round trip time is 70 ms. If I have a 32 Mbps link, I can send 64K in about 2 ms. So I have to wait 68 ms until I send another 64K. This gives an effective throughput of less than 1 Mbps. Throw in the slightest amount of congestion and things go even further downhill real fast.

Thank you for that detailled explanation. It seems that my shit detector is a little trigger-happy. :-) Your comment on the "magic algorithm" is spot-on. I could think of a way to pre-determine an algorithm which recodes 64k chunks of data by precomputing all possible 64k datastreams and then simply selecting the datastream which represents that 64k (or whatever size). Anyway the other comments in this article have already attacked the idea on its use of UDP and its lack of net-friendliness. I predict a number of angry customers who paid $300k for a pair of these and whos data is getting dropped/throttled like crazy by their ISPs or the ISPs in the middle. :-)

TCP Fair? by Mdog · 2001-12-13 02:21 · Score: 1

In my networking class last year, they talked about new protocols having to be "TCP fair," in that they don't gain their advantages over the standard TCP by just cutting in line in front of other TCP packets...I wonder if this new algorithm claims to keep that in mind. The scenario to avoid is everybody who's 31337 switching to this new stuff, thereby slowing down the other half to gain their speedup.

Conspiracy theory? Yes. But hey, this is /.

--
Slashdot 's editors are dickheads

Re:TCP Fair? by adadun · 2001-12-13 03:55 · Score: 1

They achieve TCP-friendly congestion control by using layered multicasts. Data is sent on multiple multicast groups; a high-bandwidth receiver joins more groups than a low-bandwidth one. The transmission rates and interarrival times between so-called syncronization points can be set up in ways that makes the transmission TCP-friendly.

moron, you didn't read the article by Anonymous Coward · 2001-12-13 02:21 · Score: 0

User would go to download a game demo or something, receive pieces from several different places, and knit them together?

There is no download happenning at all here. It is doing something like this, the so-called reciever is building a binary representation of a CRC. Both sender and reciver use propietary hardware. This isn't fucking GoZilla you fucking fucktards!

Article is wrong by saridder · 2001-12-13 02:22 · Score: 4, Interesting

The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..."

This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed. The only time you get a 1 for 1 ack ratio is if you miss a packet and the window slams shut.

Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's. Frame Relay switches may drop packets if you exceed your CIR and begin bursting, so that whole transfer will never succeed. Therefore you actually waste bandwidth cause the whole transfer is doomed to fail, and the sender will never know it.

Also some routers have WRED configured in their queues, purposely dropping TCP packets to increase bandwidth on a global scale. This would damage the file transfer process as well if it was UDP based, as this system is.

Stick with the RFC's and the tried and true TCP transport system. This company will fail.

--
--- RFC 1149 Compliant.

Re:Article is wrong by Anonymous Coward · 2001-12-13 02:50 · Score: 1, Informative

I think the writer is almost right. In FTP, the packets are ordered, but they don't need to arrive in order, since you can reorder them. However if a file consists of 10 packets, you need all ten packets to recover the document. If you use erasure codes, you can take the 10 packets and generate 15 encoded packets. Once you get ANY 10 of the 15 packets, you can calculate the original file. This is cool because any given packet can be lost, so UDP is just fine to transfer each packet.

Look at google for information about forward error correcting, erasure codes, tornado codes, fcast and/or luigi rizzo. There is more to this topic that PR from business-types :)

I understand that Microsoft used fcast to multicast nightly builds over a LAN and found the rate limiting step to be the write speed of the hard drives - the math to generate the message is not a serious strain for modern cpu's. If all of us could speak fcast, getting the lastest Linux/*BSD distro via cable modem would go a lot faster.
Re:Article is wrong by StormyWeather · 2001-12-13 03:28 · Score: 2, Insightful

Sliding windows is flow control, not error recovery.

I would trust UDP with anything I would trust TCP with as long as the application does the error checking on the data, which is exactly what they are saying their product does. TCP is really high overhead compared to UDP, and not always necessary. One of the reasons for TCP was so that programers wouldn't have to deal with as much, but if you can make something that handles it more efficiently then you only have to send a retransmit request whenever there is lost data, and not after every window.

Maybe it's my tendacy to fight for the underdog but I feel UDP has gotten the shaft. It's a great way to slam traffic around, and as secure as your application is written to make it.

Nice little doc over TCP and sliding windows for anyone that might want one.
Re:Article is wrong by Minupla · 2001-12-13 03:31 · Score: 5, Interesting

Stick with the RFC's and the tried and true TCP transport system. This company will fail.

You may be right, they may fall on their noses. Or the win with their system might be large enough that we decide that we need to rework the way we do some things. Hard to say at this point.

I do take issue though with your 'Stick with the RFCs' comment. If we stick with what we know works, we will never evolve. If the folks at CERN had had this attitude, we'd still be using GOPHER (side note: my favorate interview question for testing people who claim to have a long time Net experience, 'Explain the difference between Archie and Veronicia'.)

GOPHER was the tried and true text retrieval system. Besides, UDP has an RFC, and is a perfectly legit method of moving data, provided you accept its limitations and ensure you correct for them. TCP has a lot of overhead that is not always required. If our network breaks because someone sends UDP, its we who need to reread our RFCs.

'Nuff Said.

--
On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
Re:Article is wrong by saridder · 2001-12-13 03:43 · Score: 1

From what I read, it does do error checking, but the it also said that if it dosen't get enough of the UDP packets, the whole transfer is lost. At least with TCP, if I lose a few packets, I won't have to resend the entire file, just the requested packets.

It sounds fine for a multicasting application over a LAN, it's just that I wouldn't trust it over the Internet.

--
--- RFC 1149 Compliant.
Re:Article is wrong by srichman · 2001-12-13 03:50 · Score: 2

The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..." This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed.
TCP still acknowledges every packet; it just batches acknowledgements as an optimization. An important point, though, is that if you starting dropping packets (or if they arrive out of order) at a moderate rate (could even be a small loss rate, depending on your delay and bandwidth), fast retransmit (introduced in TCP Reno) will cause you to acknowledge every packet.
In contrast, UDP never acknowledges (or disacknowledges) a packet.
Re:Article is wrong by saridder · 2001-12-13 03:50 · Score: 1

Also I'd have to argue with you that sliding windows isn't error recovery also. I agree that sliding windows is a flow control machanism, but it also serves as error recovery at the packet level if a packet isn't received.

--
--- RFC 1149 Compliant.
Re:Article is wrong by Anonymous Coward · 2001-12-13 03:55 · Score: 0

if it dosen't get enough of the UDP packets, the whole transfer is lost.

You're talking about an application-specific feature. UDP/TCP are much lower-level than this.
Re:Article is wrong by saridder · 2001-12-13 03:59 · Score: 1

No I'm talking about this UDP transporter. See article's quote below...

"We send recipes, not pieces of content," said Clifford Meltzer, chief executive of Digital Fountain. "Once you get enough [of the packets] coming in, Spock appears. If you get 98 percent of the packets, you get nothing."

--
--- RFC 1149 Compliant.
Re:Article is wrong by dieman · 2001-12-13 04:55 · Score: 1

Wrong. UIUC, not CERN, eh? Plus, its not like gopher would have stayed static.

--
-- dieman - Scott Dier
Re:Article is wrong by hardburn · 2001-12-13 04:55 · Score: 1

Stick with the RFC's and the tried and true TCP transport system. This company will fail.

UDP is just fine for this application because you don't need to receive every single packet to reconstruct the entire set of data. Read up on Tornado codes.

Only time will tell if this is a good idea or not. If anything, they will probably fail because of creating a propreity standard and fall away into obscurity (or get bought by one of the big players).

--
Not a typewriter
Re:Article is wrong by seanadams.com · 2001-12-13 05:35 · Score: 4, Interesting

Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's.

You're missing the point of UDP. UDP is just a *tiny* layer on top of IP, which adds a little extra information (basically the port number) so that the OS can deliver a packet to the right application. UDP can not be compared with TCP - it doesn't provide reliability and flow control, and it has absolutely no notion of a stream of data. If desired, these can be provided in the application layer (see UDP, TFTP, NFS, etc. etc.)

TCP is a reliable transport, but it's much, much more than that. First off, the fact that you're using TCP doesn't make the path between sender/receiver any more reliable. Your packets get dropped just the same as if they were UDP or any other protocol over IP. TCP provides reliability by retransmitting lost packets, but you knew that. It also provides flow control and congestion avoidance - this means detecting when the receiving end (and the router queues in between) are ready for more data, and throttling your transmission rate to match that capacity. It also means being "fair" to other streams sharing the same bottleneck(s). It does this by "backing off" the number of packets in flight, i.e. halving the congestion window, to reduce the data rate. These algorithms are a very active field of research - there's a *lot* more to TCP than meets the eye of a socket programmer.

When TCP loses a packet, that packet must be retransmitted. This is expensive because it means another RTT.

Anyhoo...

You can think of FEC for data transmission as being analogous to RAID 5 for storage. By adding one extra bit (the XOR of the rest of the word) you can lose any single bit and still know it's value. It's very simple. If the word is:

0 1 0 1

And I add an extra "parity" bit, where the parity bit is 1 is the number of ones in the rest of the word is odd, zero if it's even:

0 1 0 1 [0]

I can now lose any one bit (including of course the parity bit). Eg if I have only

0 1 X 1 [0]

Then I know the missing bit is a '0', because if it were '1' then the parity bit would be a zero.

Applying this to data transmission, you can see that by sending just one extra packet, you greatly reduce the chance of having to retransmit anything.

EG if I have to send you 10 packets over a link with 10% packet loss, there's a 65% chance that I'll have to retransmit one of those packets. (and a 10% chance that each retransmitted packet will have to be sent again, and so on).

However if I'm using FEC and I send one extra "parity" packet, then I only have to retransmit if TWO OR MORE packets are lost. The chances of losing TWO out of the eleven packets is only 30%, so you can see that for an overhead of 10%, I've reduced the number of retransmits by a factor of more than two! I hope those figures are right. I used this tool to calculate them. Of course there are a lot of knobs you can turn depending on how much overhead you can afford for the parity information, and what degree of packet loss you want to be able to tolerate.

Anyway, you can see that there are lots of possible improvements/alternatives to TCP - it's an old protocol and a lot of research has been done since RFC 793.
Re:Article is wrong by seanadams.com · 2001-12-13 05:39 · Score: 2

(see UDP, TFTP, NFS, etc. etc.)

Er... I meant (see DNS, TFTP, NFS, etc. etc.)
Re:Article is wrong by saridder · 2001-12-13 05:53 · Score: 1

Thanks. But what if the parity packet is lost? I would think that that parity packet has just enough chance of getting lost as any other packet in the stream. How do you validate the previous packet that the parity is supposed to validate.

--
--- RFC 1149 Compliant.
Re:Article is wrong by evilviper · 2001-12-13 05:58 · Score: 2

We send recipes, not pieces of content," said Clifford Meltzer, chief executive
of Digital Fountain. "Once you get enough [of the packets] coming in, Spock
appears. If you get 98 percent of the packets, you get nothing.

I think you are misreading this quote... They are just saying that you can't download part of a file and have the file work partially... I.E. downloading a movie on Gnutella often results in half-movies when the server disconnects, but you are still able to watch the first half of that video. This doesn't imply anything about a limitation in UDP.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:Article is wrong by evilviper · 2001-12-13 06:02 · Score: 1

Sliding Windows is purely flow control. ACKnowledgements may be an error recovery mechanism, but all sliding windows are is a method to send fewer ACKs with the same error correction ability that a one packet-to-one ACK would acomplish.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:Article is wrong by seanadams.com · 2001-12-13 06:05 · Score: 2

But what if the parity packet is lost?

Parity bits are also used in serial data transmission to confirm data inegrity - if the parity bit is wrong, then you can be sure that there's something wrong with the link layer.

In this case, the parity packet is being used to recover *lost* data, not to confirm data integrity. You would still have a per-packet checksum to ensure that each packet is valid.

Bit errors are extremely rare on the Internet. Normally when there is a significant chance of data corruption, say on a wireless or analog modem line, you would have strong error checking in the link layer.

Lost packets, on the other hand, are very common . In fact they're an intended consequence of TCP - you send data as fast as possible until packets are dropped, and then you back off. If you're getting lots of loss, it doesn't mean there's something "broken" with the network, it just means that the link is over-saturated.
Re:Article is wrong by saridder · 2001-12-13 06:14 · Score: 1

I get that. But what I'm saying is with a UDP based transport system (even with a Tornado type transfer, which if I underdtand this logic, packetizes a file in a propreitary numbered scheme, sends multiple copies of each individual packet to the clients in a "storm cloud" type transmission, and if a cleint catchs a certain percentage of the "cloud" it will have enough to reconstruct the original file.)

But there's no guarantee that the client will recieve the entire file, and since there is no retrans function in UDP, like in TCP, the server will have to retran. the ENTIRE file again instead of the missing packets, which is a waste of bandwidth. Plus those clouds of data are a waste of bandwidth. Like I said earlier, it's a fine, if expensive system over a high-speed LAN, but it does not optimize WAN bandwidth.

--
--- RFC 1149 Compliant.
Re:Article is wrong by saridder · 2001-12-13 06:23 · Score: 1

Bit errors aren't that uncommon. Do a sh int serial[#] on a Cisco router and you'll see a number of errors on that line.

So a partiy packet recovers lost packets? But are there seq. numbers for the parity packet? How do it know that this parity packet is for what pevious packets?

--
--- RFC 1149 Compliant.
Re:Article is wrong by seanadams.com · 2001-12-13 06:32 · Score: 2

Bit errors aren't that uncommon. Do a sh int serial[#] on a Cisco router and you'll see a number of errors on that line.

Right - it's detecting the error at the link layer (using HDLC or PPP checksums, I guess), and dropping the packet at that point. The Cisco doesn't forward the packet if it knows it's corrupted.

So a partiy packet recovers lost packets? But are there seq. numbers for the parity packet? How do it know that this parity packet is for what pevious packets?

I don't know the details of any protocols that do this, but yes, there would have to be something like sequence numbers to associate packets with the correct parity information. I haven't read the article yet (shame on me) but it sounds like this protocol is much more sophisticated than the one I've described.
Re:Article is wrong by StormyWeather · 2001-12-13 11:39 · Score: 1

No, you can do retrans out of the application rather than the protocol.
Re:Article is wrong by Anonymous Coward · 2001-12-13 18:03 · Score: 0

'Explain the difference between Archie and Veronicia'

Its spelled Veronica.
Re:Article is wrong by saridder · 2001-12-14 01:00 · Score: 1

The only point I'm trying to get is that it is better to have a packet retransmitted instead of the whole file.

Also, I am not a programmer, but it is my understanding that applications do not know what protocols are, and that is handled by the OS. So I don't *think* (I'm guessing) an app can ask for a packet retransmit, just a file retransmit.

--
--- RFC 1149 Compliant.
Re:Article is wrong by Anonymous Coward · 2001-12-14 17:46 · Score: 0

Isn't it nice to see that meta-moderation hasn't put an end to infantile and unfair moderators.
Re:Article is wrong by Anonymous Coward · 2001-12-16 13:00 · Score: 0

It's spelled "it's", buddy.

PostScript for data by swb · 2001-12-13 02:22 · Score: 2

This sounds a lot like what PostScript is to a rasterized file. A set of descriptions of what the picture looks like, which are small and easy to transmit, which are then drawn to produce the picture.

With real vector PS its easy, since you start out by creating vectors (eg, Adobe Illustrator). How you get from a non-vector "destination" to the metadata you really want to transmit sounds complicated.

Udpcast by BlueUnderwear · 2001-12-13 02:22 · Score: 2

Udpcast in FEC mode does this too: in addition to the original data, it can transmit an arbitratry amount of "FEC" blocks which are a linear combination of the data blocks. If some data blocks are lost in transit, udpcast can recalculate them from the FEC blocks by multiplying the vector of received data by the inverse encoding matrix.

--
Say no to software patents.

prozilla by Anonymous Coward · 2001-12-13 02:23 · Score: 0

prozilla does the same job under linux.

XOR = advanced algorithm by null+etc. · 2001-12-13 02:23 · Score: 2, Informative

Quoted from the article:

In this case, the Transporter Fountain creates not equations but hundreds of millions of "symbols" which can be used to reconstruct the data. The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.

So, assuming that each "symbol" is at least one byte, then creating "hundreds of millions" of these symbols would result in hundreds of megabytes of data. Furthermore, the guy quoted 20MB as being a large amount of data to send.

Conclusion: Only sales & marketing would try to sell a product that turns 20MB into 100MB, sends it via UDP, only in order to have the results XOR'd together.

Where do they get these people?

Re:XOR = advanced algorithm by anthony_dipierro · 2001-12-13 03:09 · Score: 1

Conclusion: Only sales & marketing would try to sell a product that turns 20MB into 100MB, sends it via UDP, only in order to have the results XOR'd together.

Absolutely. The only usefulness for this would be in a situation where two-way communication is simply not possible. There are improvements which can be made to TCP for file transfers, but dropping all feedback mechanisms is simply stupid.

As far as only a certain number of packets being required to be received on the other end, that is either completely false, or the overhead is astronomical. More likely it is just plain wrong, or this product is complete vaporware to begin with.
Re:XOR = advanced algorithm by BlueUnderwear · 2001-12-13 03:42 · Score: 2
Absolutely. The only usefulness for this would be in a situation where two-way communication is simply not possible. There are improvements which can be made to TCP for file transfers, but dropping all feedback mechanisms is simply stupid.
Dropping all feedback is interesting in situations where:
- there is a very high latency, and feedback would arrive so late that it would seriously slow things down (i.e. satellite links)
- the return channel is costly (again, satellite links, with a terrestial (dial-in) return channel). If you can do away with the return channel, you win the cost of dialing in, which is interesting if you are on a metered telco, as is common in Europe.
As far as only a certain number of packets being required to be received on the other end, that is either completely false, or the overhead is astronomical. More likely it is just plain wrong, or this product is complete vaporware to begin with.
This is feasible in a way to make the number of extra packets tuneable to the expected loss.
The algorithm is indeed based XOR, but that's not the only component though. The idea is to define a field on the set of all byte (or short, or word...) values. You use XOR as the addition, and Galois multiplication as the multiplication.
Then you treat your data blocks as vectors, of which you can do linear combinations in the Galois field.
If you have n data blocks to transmit, and want to add k redundant blocks, you first arrange your n data blocks in an n-element vector. Then you multiply that vector with an n times n+k Vandermonde matrix to optain a new vector of n+k elements. Those n+k elements are the blocks which will be effectively transmitted
A Vandermonde matrix is a matrix having the following form:
1 x_1^1 x_1^2 x_1^3 ...
1 x_2^1 x_2^2 x_2^3 ...
1 x_3^1 x_3^2 x_3^3 ...
1 x_4^1 x_4^2 x_4^3 ...
.......
A square Vandermonde matrix has the interesting property of being inversible. Moreover, a subset of rows of a Vandermonde matrix is still a Vandermonde matrix. Loosing packets is equivalent to dropping rows (as each one of the n+k transmitted packets is obtained by multiplying one row of the matrix by the vector of the original n packets).
The receiver just calculates the remaining Vandermonde matrix (by striking out the rows that correspond to the dropped packets, plus some more if less than k were dropped), and inverts the remaining matrix. By multiplying this inverse with the vector of received blocks, the receiver can obtain the original vector of n data packets.
The nice thing about this is that k is freely tuneable (as long as the field is big enough: but if you define your field over 4 byte values, that should not be a problem). So, you just take a value for k that matches the expected loss rate plus some comfortable safety margin, and you're set. Considering that, turning 20MB into 100MB will only be necessary if you expect to loose 4 packets out of 5...
Of course, all this doesn't resolve the issue of flow control, so the sender needs to be tuned manually such as not to emit faster than the physical link can handle.
--
Say no to software patents.
Re:XOR = advanced algorithm by anthony_dipierro · 2001-12-13 04:01 · Score: 1

Considering that, turning 20MB into 100MB will only be necessary if you expect to loose 4 packets out of 5...

If you expect to lose 4 packets out of 5, you need to turn 20MB into a lot more than 100MB. consider 1MB packets. If you transmit 100 packets, and expect to receive 20 of them, the chances that you transmitted the right 20 is very very slim. And you absolutely cannot produce a scheme where any 20 will give you the right answer. We are assuming your 20MB is already optimally compressed. Therefore there are 2^20MB possible messages. The amount of information that must be transferred reliably is 2^20MB. This information must be transferred reliably and in order. You simply cannot have two different sets of 20MB be equivalent, because if you do, you lose information (pidgeon-hole principle). So given 19 packets that are received, there is exactly one 1 MB packet which can be received.

It's been a while since I've calculated hamming distances, so I'm not going to get into the exact number of packets that need to be sent, but I hope that my discussion above showed that the number is greater than simply multiplying by the inverse of the expected reliability.
Re:XOR = advanced algorithm by BlueUnderwear · 2001-12-13 04:39 · Score: 2

If you expect to lose 4 packets out of 5, you need to turn 20MB into a lot more than 100MB.
You are right, you need slightly more to cover "standard deviation". If you toss a coin 100 times, it's very rare that you get exactly 50 heads. You may get 40, you may get 60. That's due to standard deviation, but the effect wears off the bigger your sample. So more than 100MB will be necessary, but not substantially more.
If you transmit 100 packets, and expect to receive 20 of them, the chances that you transmitted the right 20 is very very slim. And you absolutely cannot produce a scheme where any 20 will give you the right answer.
Please reread my post. Any 20 are ok, that's the whole point.
This information must be transferred reliably and in order
Order is easy to take care off. Just add a sequence number to your packets to tell them apart (yes, that's another overhead, but it's small).
You simply cannot have two different sets of 20MB be equivalent, because if you do, you lose information (pidgeon-hole principle). So given 19 packets that are received, there is exactly one 1 MB packet which can be received.
Well, if you re-read my posting, you'll see that yes, any 20 packets would be enough to reconstruct the information (as long as they are different, of course...duplicating a same packet 20 times won't work obviously).
It's been a while since I've calculated hamming distances, so I'm not going to get into the exact number of packets that need to be sent, but I hope that my discussion above showed that the number is greater than simply multiplying by the inverse of the expected reliability.
The number is indeed greater (due to standard deviation), but not substantially so.
If you're interested in the specifics, please read Luigi Rizzo's paper.
You might also want to check out udpcast which provides a working implementation of a FEC algorithm, that works in practice (albeit only on "slice" sizes of up to 128 packets, but that's purely for practical reasons: in theory no such limitation would be necessary).

--
Say no to software patents.
Re:XOR = advanced algorithm by anthony_dipierro · 2001-12-13 05:58 · Score: 1

Well, if you re-read my posting, you'll see that yes, any 20 packets would be enough to reconstruct the information (as long as they are different, of course...duplicating a same packet 20 times won't work obviously).

If any 20 1MB packets are enough to reconstruct the information, then you're sending less than 20MB of information.

Even with a high latency return channel, you're still better off sending some information back. For instance, what if every minute you sent back a single packet with the checksum of all the packets you've received so far. That alone is going to greatly enhance your efficiency. If it's possible (and cost effective) to have two-way communication (if the latency is no greater than about half the total transfer time), that's always going to be more efficient (faster).

The number is indeed greater (due to standard deviation), but not substantially so.

We disagree then only about how substantially so. If the file is 20 megs, we have 1 meg packets, and the loss rate is exactly 5% (one packet), we need to send at least 25 packets. That's an overhead of 20%. I find it hard to come up with a situation where bidirectional communication is possible with a latency less than 50% of the total transfer time, in a full duplex system (or at least full duplex end-to-end but not hop to hop, such as in internet communication), in which it doesn't make sense to not utilize bidirectional communication. The simple information of which packets were dropped is going to drastically increase efficiency. In the above example even with a latency of 50% of the transfer time we'd get a NAK 47.5% of the time and only have to transfer 1 extra packet instead of 5. 47.5% of the time we'd get an ACK and could send 4 packets instead of 5. And 5% of the time the ACK/NAK would be lost and we send all 5 FEC packets.

This is a fine idea for multicasting (although it's probably not the most efficient). For unicasting, unless you have virutally no return connection, it's pretty stupid.
Re:XOR = advanced algorithm by Happy+Monkey · 2001-12-13 06:12 · Score: 2

There's a program called "Mirror" that's out on usenet these days that will reconstruct missing parts of a posted file using the parts you do have and a small number of extra files. I'm not sure how many of each are required, but the number of extra files is very small compared to the number of actual files, and it can reconstruct quite a few missing files.I can't get more specific, but it's pretty cool.

--
__
Do ya feel happy-go-lucky, punk?
Re:XOR = advanced algorithm by TheMidget · 2001-12-13 07:00 · Score: 1

If any 20 1MB packets are enough to reconstruct the information, then you're sending less than 20MB of information.
This is technically true, as you do indeed need to account for space for sequence numbers in packets. However, at 4 bytes per 1456 byte packet, this is negligible (less than a third of a percent).
We disagree then only about how substantially so. If the file is 20 megs, we have 1 meg packets, and the loss rate is exactly 5% (one packet), we need to send at least 25 packets.
That depends on the desired probability to get the file. Working out the probabily that out of 25 packets, less than or exactly five are lost, given that probability of loss of one packet is 5%, we obtain that with 25 packets, the probability of receiving the file is 99.8% (vs a measly 71.7% if just one extra packet was transmitted)
However, now let's take 200 instead, to let the "law of big numbers" play into our favor:
We now need 21 extra packets to get the same 99.8% confidence that your five extra packets gave you for 20. However, now the overhead is only 10% rather than 20. And with 2000 packets, we only need 137 redundant packets to achieve 99.8% of confidence of successful transmission, that's only 6.8% overhead. The bigger the total number of packets, the closer we get to the theoretically possible 5 percent.
All of this assumes of course that a modern FEC algorithm is used.
Re:XOR = advanced algorithm by anthony_dipierro · 2001-12-13 07:49 · Score: 1

The bigger the total number of packets, the closer we get to the theoretically possible 5 percent.

That is true, but the bigger the total number of packets, the longer the total transfer time, and therefore smaller the percentage of total transfer time the latency is. It only makes sense to use forward correction if you haven't received an ACK or NAK. This is only going to be true for a small number of packets at the end of the transmission, when we're talking about bidirectional one to one communication. With bidirectional one to many communication this might be useful, if the error rate is sufficiently high and randomly distributed. And with unidirectional communication it is of course necessary.

Re:Link to product by Omnifarious · 2001-12-13 02:23 · Score: 3, Informative

Oh, yes, here's a link to the SourceForge project for SwarmCast.

--
Need a Python, C++, Unix, Linux develop

Tornado Codes by Jonas+�berg · 2001-12-13 02:24 · Score: 5, Informative

While not actually related, John Byers, Michael Luby and Michael Mitzenmacher wrote a paper on using Tornado codes to speed up downloads. Basically, what they propose is clients accessing a file from more than one mirror in parallel and using erasure codes to make the system feedback-free. The abstract:

Mirror sites enable client requests to be serviced by any of a number of servers, reducing load at individual servers and dispersing network load. Typicall, a client requests service from a single mirror site. We consider enabling a client to access a file from multiple mirror sites in parallel to speed up the download. To eliminate complex client-server negotiations that a straightforward implementation of this approach would require, we develop a feedback-free protocol based on erasure codes. We demonstrate that a protocol using fast Tornado codes can deliver dramatic speedups at the expense of transmitting a moderate number of additional packets into the network. Our scalable solution extends naturally to allow multiple clients to access data from multiple mirror sites imultaneously. Our approach applies naturally to wireless networks and satellite networks as well.

I don't have the paper in a computer format, but the number is TR-98-021 and John and Michael were both at Berkeley at the time (1998), so it should be fairly easy to find if someone is interested. Doubtlessly, a number of other reports on the subject should also exist that deals with the same problem but with different solutions.

Re:Tornado Codes by Anonymous Coward · 2001-12-13 02:56 · Score: 0

Actually, this is exactly what Data Fountain does. Erasure code for those who doesn't know represent a discreet piece of the data, however, the pieces overlap creating redundancy.

Redundancy counters dropped packets, and so no need for feedback. Freenet wants to implement something like this for a while now.
Re:Tornado Codes by NickPest · 2001-12-13 06:12 · Score: 2, Informative

You can find the paper here.
The folks at Digital Fountain are indeed using a highly-tuned (and proprietary) version of tornado codes. I also recommend the following papers if you're interested in what I think has the potential to be the greatest thing since TCP: tornado codes + end-system multicast

<shameless plug>
I'm currently working on a research project with John and others that integrates tornado codes and end-system multicast into a Freenet-like system. Best of all, it's GPL'd!
</shameless plug>
Re:Tornado Codes by Anonymous Coward · 2001-12-13 09:24 · Score: 0

Sounds cool. When do I start to use it? :)
Re: Tornado codes by Omniscient+Ferret · 2001-12-13 13:39 · Score: 1

Tornado codes were supposed to make it into the Freenet project. The chunking would distribute large files amongst nodes much more widely, and you get automatically load balanced requests for speed.

I'm not sure whether it ever made it out of development...

What's going on here? by gotan · 2001-12-13 02:25 · Score: 2

Sorry, the whole article seems to make some magic mumbo-jumbo out of the process. Apparently the file is transformed, but how does that transformation help? The main difference between UDP and TCP in this case is, that TCP maintains the sequence of Packets, so after splitting a file up, sending it as TCP-Packets and combining it again, all parts (sent as Packets) are in the right place. UDP does no such thing, and also UDP doesn't check, if a packet really reached it's destination. This frees UDP of some overhead TCP has. But to send a large File (with a simple approach), now you have to label each UDP-Packet with a sequential number, and, at the end, check if all Packets arrived (and maybe request missing Packets again), then rearrange them according to the sequence numbers.

Now i don't see, how a transformation of content helps here, instead of adding the information where in the file the packet goes (a kind of serial number), now you have to label, where in the equation it should go (a kind of coefficient index), so the receiving end knows, whether it has all information, and which information is still missing, and must be requested again.

--
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks

Re:What's going on here? by bpowell423 · 2001-12-13 02:55 · Score: 3, Informative

That's not it. Because of their algorithm, order doesn't matter, and neither do dropped packets. The receiver only needs any n packets, so the transmitter keeps sending until the receiver says it got enough. Then the results are magically XORed to get the original file. So, no, you don't need a sequential number, you don't need to check if all packets arrived, and you don't have to rearrange them.

More commentary by autocracy · 2001-12-13 02:26 · Score: 2

It's an encoding scheme that sends you the instructions on how to build something rather than the stuff itself. Not so special as they make it sound. Saying that you get the data without it being sent to you is the biggest troll for mid-level clueless managers that want to download their "repr0ns" faster. Not that I'm even sure it will work that well....

--
SIG: HUP

Get a clue - you compression bigots by Anonymous Coward · 2001-12-13 02:26 · Score: 0

That the part about /. I hate the most, the stupid phuknutz who post without engaging their brain. The company doesn't do compression like you all think, they abstract the data into a series of equations that represent the content. The result is that it doesn't matter which packets you download from the source, just that you download enough of them. Once you have enough packets, you can solve the series or equations (n equations, n unknowns) to get back the original data. Duh.

Would be nice if it works... by voronoi++ · 2001-12-13 02:26 · Score: 1

Interesting idea, I wonder how it could work in practice.

They are going to have to deal with flow control, dropped packets, etc... I wonder what happens if the receiver crashes?

I have a feeling that they may be sending quite a bit of redundant data (perhaps similar to the way CDs are encoded at the hardware level), and they are betting that the signal to noise ratio is good enough for error correction software to deal with it. With a bit of luck they should be able to use more of the bandwidth available.

I wonder what would happen if lots of people start using something like this. Would the extra bandwidth actually slow things down, even though an individual download was faster?

I wish I had more energy left after doing my day lob, since this sounds like a fun side project...

FTP doesn't require in order delivery?!?1? by hammy · 2001-12-13 02:27 · Score: 1

I didn't think FTP required packets to arrive in sequence. Although FTP and TCP acknowledge packets in sequence they don't actually require the packets to be recieved in sequence.

They don't really make clear why this product is interesting at all except giving it a cute name.

Think Reliable Multicast + XOR Recovery by hughk · 2001-12-13 02:29 · Score: 2

This is basically what the guys doing reliable multicast get up to plus what you do for tape backups. It isn't particularly new.

You create your data in records and groups. Each group contains a longitudenal XOR of the other records within the block. This comes from tape backups that were proof against bad-spots and was later used in RAID.

You then sequence and shoot your data in records across the network. If one record is dropped, it can be recreated through the XOR redundancy record. If two records are dropped, you need a rerequest mechanism. This can be either on UDP or via a separate TCP link.

If you want an example of prior art, go to the trading room of a bank. How do you think all those prices are delivered to every workstation?

--
See my journal, I write things there

michael didn't read the article carefully, I guess by rlowe69 · 2001-12-13 02:29 · Score: 2

...wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together?

michael, this is not what the product does. From the article:

By translating a packet stream into mathematical elements, the company eliminates the back-and-forth transactions that confirm whether data has reached its destination. In the Digital Fountain approach, the receiving end waits until it has received a certain number of packets, then signals the transmitting side to stop sending. The operation doesn't require a network processor, but relies instead on the computational power of standard PC processors.

The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.

It appears as though the singal is broken down into equations, that when combined produce the original data. These equations are all sent from the same server to the destination client. The speed increase then comes from the fact that the size of the equations is less than the size of the data.

The article does not mention that the equations come from multiple servers, which is a very big difference! IMO, this technology is much more newsworthy than yet another multi-server downloading tool like Kazaa.

--
----- rL

Read the article! by alt.sex.fetish.jesus · 2001-12-13 02:29 · Score: 1

> User would go to download a game demo or something,
> receive pieces from several different places, and knit
> them together?

This technology has *nothing* to do with ``downloading chunks from multiple sources and splicing them together''. Man, it's bad enough seeing how many Slashdot readers didn't bother reading the article, but Michael himself didn't bother reading the article.

Isn't this a contradiction? by Anonymous Coward · 2001-12-13 02:29 · Score: 0

First the article says this:

"If you get 98 percent of the packets, you get nothing."

Then it says this:

"The arrangement saves time because neither side cares if a packet gets dropped, thus eliminating the dialogue required by TCP and FTP."

Huh? Well if it doesn't care that a packet gets dropped, but it still needs all the packets, how does the darn thing do anything at all?

Re:Not a new concept - mod down parent by Omnifarious · 2001-12-13 02:30 · Score: 1

BTW, this post needs to be modded down rather badly as the person who wrote it doesn't seem to have read the article, or if (s)he did, (s)he misunderstood it badly.

--
Need a Python, C++, Unix, Linux develop

Mathematical Equations? by Anonymous Coward · 2001-12-13 02:32 · Score: 0

Can they get more fancy? "...based on a set of mathematical equations, it creates the data on the other side..." This sounds like generating a script that generates the data on the other side and send over the script. Or even another way of seeing it is that there is a grammar generator on the server side for the language that is the input, and then the grammar is parsed on the client side and the input is regenerated.

The question is, How much more or less data is a grammar for a large input that has little repetitions? And why UDP? Isn't UDP unreliable?

How would you mathematically encode random data? by boltar · 2001-12-13 02:34 · Score: 0

I can't think of an equation that could be used to describe a file consisting of truly random
data can you? For example a setiathome file that consists of white noise off the radio telescope.
Sure you could do FFT on it but that is a type of lossy compression, you won't be able to reconstruct the file
EXACTLY from it which is vitally important with
computer data.
So can someone explain how they would get round what seems to me an unsurmountable problem?

Mostly Old Tech by trongey · 2001-12-13 02:36 · Score: 1

This was done many years ago. There was a program in one of the rags for the Radio Shack CoCo that would take a large (16k in those days) file and build a tiny BASIC program that would reconstruct it when run.

This is a bit more sophisticated since it can rebuild with missing packets, but the general function is the same.

Still a good idea overall; at least until all of the transfer progs are full of Trojans.

--
You never really know how close to the edge you can go until you fall off.

How it works by gargle · 2001-12-13 02:38 · Score: 5, Insightful

The EETimes article is extremely poorly written.

The technique used by Digital Fountain is called Forward Error Correction. It allows a message M with m parts to be encoded into n parts, where n > m. The interesting thing about this is that any m of the n parts will allow the original message M to be reconstructed.

This means that if a part of the message is missed, the receiver doesn't have to request a resend .. it just continues listening. This is especially cool for multicast transmission since even if receivers A and B miss different parts of the message, the broadcaster doesn't have to send the missed parts of the message to the different receivers - it just continues broadcasting since any part can substitute for any other part.

Re:How it works by schon · 2001-12-13 02:47 · Score: 1

The technique used by Digital Fountain is called Forward Error Correction. It allows a message M with m parts to be encoded into n parts, where n > m

No, that's called parity, and it's been used for decades. If anything, it's even LESS revolutionary than compression.
Re:How it works by Raphael · 2001-12-13 02:53 · Score: 2

The technique used by Digital Fountain is called Forward Error Correction.

Several Forward Error Correction techniques are used in cellular networks, for example. If you have a GPRS or (soon) UMTS phone, it is already using some of the techniques that appear to be incredibly advanced if you read the EETimes article. But in fact, all these things are well known. I think that several DAB/DVB (Digital Audio/Video Broadcasting) protocols do the same as well.

--
-Raphaël
Re:How it works by Hieronymus+Howard · 2001-12-13 03:03 · Score: 2

DVB does indeed use FEC. I think that the standard FEC ratio used by Sky in the UK is 2/3 (i.e. where n is 50% greater than m).

HH
Re:How it works by Spazmania · 2001-12-13 11:17 · Score: 1

The technique used by Digital Fountain is called Forward Error Correction. It allows a message M with m parts to be encoded into n parts, where n > m. The interesting thing about this is that any m of the n parts will allow the original message M to be reconstructed.

So, set:
n=infinity
stop condition=no current listeners
router priority=all waiting unicast traffic first, discard if multicast queue full (congestion control)

And then let the client subscribe to as much of the stream as he can handle. As soon as he has m packets, he's done and unsubscribes.

Can we actually do that? Are there any FEC codes that allow for an unbounded n?

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.

read then speak, not distributed, not compression by buridan · 2001-12-13 02:39 · Score: 2, Informative

ok, people, you seem to be far away from your thinking apparatus.
compression takes data, applies an algorythm to it to generate new data that is representation of the whole.

contrary to that, this transforms the data to an algorythm that simulates the whole.

it is a minor point of order, but one worth thinking about because it is theoretically different.

this is also not distributed, it is point to point, I don't know where the submitter got the distributed model from, I suspect he pulled it out the air, but this is very clearly not that. It requires a machine at each end of the point2point.

however, logically, all data that is not random, can be represented by equations that are smaller than the data. However, the real load exists in the generation of the equations from the data, and not the reconstruction of the data itself, so for me this seems to be quite a possible project, though i suppose it will take quite a bit optimization.

one other thing, the article does not say that it uses a standardized technique, and it would be interesting if they did not use standard vector analysis or the like. If they used vectors, then this could just reduce to a system of file conversion from one standard to another. I think it would be far more interesting to be what it says it is supposed to be a file conversion to mathematical simulation of the file.

FTP Unreliable? by Tim+C · 2001-12-13 02:39 · Score: 2

From the article:

"FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes," said Charlie Oppenheimer, vice president of marketing at Digital Fountain.

Maybe I'm misunderstanding the quote, but I semi-regularly download 600+ MB iso images and other multiple-hundred-meg files over ftp links at work, and I've never had problems. It's probably just me, but that really sounds like a load of bull, designed to whip up interest in an otherwise lack-lustre product (given it's high price).

Cheers,

Tim

--
It's official. Most of you are morons.

Re:FTP Unreliable? by Fjodor42 · 2001-12-13 03:09 · Score: 1, Insightful

It is probably a typo. There would no need to use tapes to mail that amount of data either.
It should most likely read something like "20Gb"

--
"The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."

More details of their method by mfg · 2001-12-13 02:40 · Score: 2, Informative

See http://www.digitalfountain.com/getDocument.htm/tec hnology/DF_MetaContentWhitePaper_v4.pdf

The RFC by gargle · 2001-12-13 02:42 · Score: 5, Informative

http://www.ietf.org/internet-drafts/draft-ietf-rmt -info-fec-01.txt

The use of Forward Error Correction in Reliable Multicast

Enjoy.

Sounds like CELP by Anonymous Coward · 2001-12-13 02:46 · Score: 1, Interesting

Hmmm. Let's see. Instead of transmitting the data, put together a set of mathematical equations which can be used to replicate the data, then transmit a description of those equations.

Sounds like Code Excited Linear Predition. Linear prediction uses an equation which can be used to replicate a data stream. Code excited, in this case, means you have a variety of equations to choose from, and you can plug in different coefficients for the various equations. Consequently, all you need to transmit are:

which equation to use (if there are 256 equations to choose from, that's an 8-bit value)
what coefficients to use with those equations
how long to follow the resulting data sequence, before you need to change the values

CELP was applied particularly to speech compression. The DOD was using it with 4800 bps (yes, that's 4.8 kb/s, about 1/2 - 2/3 the bandwidth used by PCS cellphones) modems and encryption systems to transmit secure voice.

It sounds to me like they've got a more general selection of equations, and the resulting datastreams are interleaved. If you don't get the information for all the equations, you can't accurately reconstruct the data. Additionally, since you don't have to respond to every packet, you reduce the turnaround.

Unless I'm mistaken, Fiber Channel already does the latter part.

In short: a more general application of existing compression schemes and protocols.

95% of what you read about these days is just mucking around with new combinations and applications for existing inventions.

Forward Error Correction, MojoNation by Adam+J.+Richter · 2001-12-13 02:49 · Score: 2, Informative

The technique is called Forward Error Correction. I don't know much about it, but I know that you can do things like break up a file of N*B bytes into 2N blocks of B bytes each and then be able to reconstruct the file from any N of the 2N blocks. The GPL'ed Mojonation system uses it, if I recall correctly, to distribute each 32kB chunk of data into eight 8kB blocks allowing reonstruction from any four of them.

Might Be more Related than you think by Anonymous Coward · 2001-12-13 02:50 · Score: 3, Informative

While not actually related, John Byers, Michael Luby and Michael Mitzenmacher wrote a paper on using Tornado codes to speed up downloads. Basically, what they propose is clients accessing a file from more than one mirror in parallel and using erasure codes to make the system feedback-free.

Those "Tornado Codes" you mentioned are coauthored by one of the executives of this company (Luby is CTO).

Nice combination, but nothing fundamentaly new. by kzanol · 2001-12-13 02:50 · Score: 1

This sounds like quite a few old ideas have been combined in a new(?) way:

High - Grade compression. The system they describe sounds quite similar to wavelet compression used for image data; main difference: usualy algorithm based compression works best for sound/video data where slight differences between original and reconstracted image are acceptable - you look for an algorithm that aproaches the original, but it doesn't have to be a 100% match. For application data, there obviously musten't be any information loss.
downloading information from several sources: this is done by lots of utilities; some FTP clients, swarmcast, kazoo/morpheus to name just a few.
sliding window IP data transfer; also old stuff, you don't even have to use UDP for this, support is bult right into TCP protocol. You can easily have a tcp connection with a large window size, i.e the ammount of data that can be "on the fly" before an acknowledgement is required. Sequence of packets isn't an issue either, TCP is perfectly happy to reorder the packets on arrival. This way you avoid the problems introduced by high bandwidth / high latency connections mentioned in the article. only really stupid applications/protocols suffer from these effects nowerdays - if your application won't send the next packet before receiving an acknowledement for the previous one, your performance will obviously suck.

Considering this, the most noteworthy thing about the article seems to be the ability of their marketing folks to make this actually sound new and interesting.

--
you have moved your mouse, please reboot to make this change take effect

Luby Transform Codes by Detritus · 2001-12-13 02:52 · Score: 5, Interesting

After looking through some of the material on the company's web site, this product appears to be based on LT (Luby Transform) coding. Each encoded packet is the result of XORing a random selected set of segments from the original file. When sufficient packets have been received, they can be used to reconstruct the original file. Insert magic algorithm. The nice thing about this is that the transmitter can continually stream packets, and a receiver can start collecting packets at any point in time. When the receiver has collected sufficient packets, it can reconstruct the original file. Packet ordering is totally irrelevant. You just need enough packets to generate a unique solution. The math for the code has not been published yet, but this is supposed to be a successor to tornado codes, which have been described in the literature.

--
Mea navis aericumbens anguillis abundat

Re:Luby Transform Codes by addaon · 2001-12-13 03:15 · Score: 1

"the transmitter can continually stream packets"

Seems like something like this would be a life-saver in the multicast arena. Currently, if you want to multicast something, and allow a client to pick up anywhere, you have to multicast a repeating loop, and allow the client to pick up anywhere in the loop, then detect (numbered sections, usually) when the loop has gone all the way around. With this method, most of the overhead of bookkeeping would be removed; or, rather, would be inherent in the protocol. For you and I, writing a chat room, it hardly makes a difference... but for a company like Akamai, or for huge mirror sites, this might be a big deal.

--

I've had this sig for three days.
Re:Luby Transform Codes by asldihf · 2001-12-16 15:35 · Score: 1

Digital Fountain's transport, in discussion here, takes a data set (ie 01010100001010) and creates a continuous flow of unrelated linear equations describing the dataset. Then it packs these random equations in UDP payloads and transmits to the receiver. Each packet is non sequenced and mutually exclusive which makes loss a moot point since the next packet coming down the pipe is just as valuable as the one lost. When the receiver gets enough packets, it uses the equations it has gathered to 'solve' for the original dataset. Also, the overhead is minimal - if it takes 1000 packets to send a file in TCP it takes 1050 with Digital Fountain.

Don't Worry. by McFly777 · 2001-12-13 02:53 · Score: 1

Given the current patent practices, none of us will be able to touch this for 17 years. Even longer (70 + life) if they can claim that this is a copyright + content control (ala DMCA) technology.

--McFly777

--

McFly777
- - -
"What do people mean when they say the computer went down on them?" -Marilyn Pittman

security by night37 · 2001-12-13 02:55 · Score: 2, Offtopic

I wonder how hard it would be to highjack a UDP based session like this. What if bogus packets are injected along with the stream of valid ones. Does the math include any form on encyption? Or is this a tunnel for other protocols? Damn it, we need to move away from clear text protocols, not create new ones!

Re:security by xercist · 2001-12-13 04:57 · Score: 1

All that would have to be done is to send a simple MD5 or SHA1 of the data, cryptographically signed by the trusted source. This way it's obvious if something has gone wrong because the checksum doesn't match.

--

--
grep "xercist" /dev/random ...you'll find me in there someday

A bit insulting by srichman · 2001-12-13 02:58 · Score: 2

Maybe, but comparing what GetRight et al. do (parallel downloading from FTP mirrors) to how Digital Fountain would achieve the same thing (erasure codes) is like comparing cups and string to IP telephony. Sure, they achieve the same thing, but the comparison is a bit insulting...

With the GetRight solution, 100 people downloading from my FTP mirror = 100 TCP streams worth of bandwidth and CPU consumption. With the Digital Fountain solution, 100 people downloading from my mirror = 1 stream worth of bandwidth and CPU consumption.

Re:A bit insulting by Anonymous Coward · 2001-12-13 04:27 · Score: 0

Yes, comparing getRight, kazaa, what_have_you with
this approach is a bad idea.

It's two completly diffrent technologies.
Re:A bit insulting by Anonymous Coward · 2001-12-13 06:14 · Score: 0

so it is management and server side overhead we are losing...cool thanks

um .. Opencola by xagon7 · 2001-12-13 02:58 · Score: 1

http://www.opencola.org

Lots of programs do that by StrawberryFrog · 2001-12-13 02:59 · Score: 2

A win32 FTP client called GetRight as been capable of doing this since several years ago.

They called it "Segmented Downloads", ie the program would hit multiple ftp sites for the same file, doing a resume to get different parts at the same time. Heck, it would even locate the mirrors for you.

And yes, it caused a substantial improvement in download speed. It seems thus that the bottleneck is seldom in the last mile.

But this has little to do with the article, which as far as I can tell is mostly gibberish.

--

My Karma: ran over your Dogma
StrawberryFrog

Re:Lots of programs do that by ret · 2001-12-13 05:09 · Score: 1

Does anyone have any info on how getright actually works, like technical info?

mathematically speaking, what getright does should make no difference (although i've used it and yes, downloads were consistantly faster). From what I can tell with getright is it contacts the ftp server, it makes 4 or 5 connections to the same server, and splits the file into 4 or 5 chunks... unless you are on a very fast connection and downloading from a very fast server with routers somewhere in between (or a limiter on the ftp) that only allows a fraction of your possible download speed per connection, this should not work.

For example, you have a 1MB file and a 56k modem, you can download 1MB at 5.6k/sec (yes yes, not realistic numbers, the speed isn't that consistant and rarely right on like that, blah blah). Anyway, you download 1MB at 56k, or you download all at the same time 4 256k files at about 1.4k/sec... either way, it comes out to the same amount of time downloading... you've only got 5.6k worth of inbound for all connections combined, unless they do something very strange, the speedup is nothing but a placebo effect in most cases.

Now, if you have a 1.5mb/sec cable connection and you are downloading from say a t1 with no traffic but you and the ftp is setup so that each connection can only get 384k/sec, then yes, this would work... instead of 1 384k connection, you'd have 4 384k connections at once downloading the file, but how often is this the case?
--
Re:Lots of programs do that by InsaneGeek · 2001-12-13 05:45 · Score: 2

The splitting is really meant for multiple ftp servers presenting the same file, not for going to the same one webserver multiple times... like this:

Your box A sits on a T3

Ftp server B sits on a T1
Ftp server C sits on a different T1
Ftp server D sits on another different T1

The next three processes occur at the same time:

box A ftp's to server A starts request (getting first 1/3)
box B ftp's to server B issues a reget a third of the way into the file (getting middle 1/3)
box C ftp's to server C issues a reget two-thirds of the way into the file (getting last 1/3)

Maximum bandwidth from any one site is a single T1, you are then able to download the file at 3xT1 (~4.5mb). More often than not, with broadband it's not the end user who's got a bottleneck, but the end site who's got a T1, but has 1000 other people requesting files at the same time.
Re:Lots of programs do that by StrawberryFrog · 2001-12-13 06:54 · Score: 1

From what I can tell with getright is it contacts the ftp server, it makes 4 or 5 connections to the same server
Nope. 4 or 5 different servers.

--
My Karma: ran over your Dogma
StrawberryFrog
Re:Lots of programs do that by ret · 2001-12-13 06:54 · Score: 1

yes, if it was getting from multiple sites that could work, but getright makes multiple connections to the same server, which means it doesn't (or shouldn't) make any difference except in rare cases.
--
Re:Lots of programs do that by ret · 2001-12-13 06:57 · Score: 1

I've used getright several times, it connected all to the same server. it could not know the different servers and paths to the file on those different servers without you specifying each one and I know I have never done that, just had it open, clicked on the link for the one server, and away it went making several connections to the 1 server. Perhaps I was using it wrong, but I wouldn't imagine that they would make it more complicated than doing that because the general public (most of the people who would be using it) couldn't figure out this whole specifying different servers and the paths to the files on each server, etc.
--
Re:Lots of programs do that by Mr+Z · 2001-12-13 07:09 · Score: 1

Could make a difference when the server is a load-balanced cluster. Also, if an ACK gets dropped for one stream, the others can absorb the available bandwidth while the retransmit timer times out, which can be useful even there's only one computer at both ends...
--Joe

--
Program Intellivision!
Re:Lots of programs do that by InsaneGeek · 2001-12-13 07:32 · Score: 2

Getright does make multiple connections to multiple servers... checkout http://www.getright.com/fastest.html under the "Segmented (Accelerated) Download" header, it talks about this exact feature, and talking to multpile hosts.

Maybe you have it configure to allways make multiple connections. I've never used the software so I don't know if there is such a setting. (only other reason I could see for this, is for sites that have a per-connection bandwidth set, you could then get around the admin settings).
Re:Lots of programs do that by ret · 2001-12-13 07:44 · Score: 1

hm, ok, so apparently it can... guess that'll teach me to rtfm, eh? by default it doesn't though still, so by default, it still really doesn't do shit in most cases... oh well, at least it has the fucntionality to be able to work if a person feels like reading up on it, which is better than I thought.
--
Re:Lots of programs do that by zeno_2 · 2001-12-13 08:35 · Score: 1

Ive seen download accellerator connect to a single ftp site multiple times. I think it does this when it cant find any mirrors. But, ive even seen that help.

Lets say that for a single connection, and ftp site has the bandwidth throttled for 5k a sec. With download accellerator, I can connect up to the server 5 times, and each 5 connections seems to get its own 5k a sec, so 25k total.

But, I got mad at how these programs will sometimes come up when downloading something, and sometimes the default browsers download window will come up. Ive just gone back to using whats within the browser..

Zeno
Re:Lots of programs do that by waitdyahoo.com · 2001-12-13 08:39 · Score: 1

Sorry if this has already been answered, but one of the reasons it can go faster by opening multiple D/L from the same site is servers never use the max bandwith avaible for each download because it is a shared resource.

So when you start more then one D/L from the same server you may get a slightly slower average from each D/L but a higher total D/L rate.

And like some one else brought up most sites have more then so even if it says the same name more often then not it is 2 differnt machines in a cluster.
Re:Lots of programs do that by igrek · 2001-12-13 09:15 · Score: 2

The main difference is that GetRight or others of the same type work on top of the TCP/IP, while the article talks about UDP-based protocol.

The idea is to get rid of TCP/IP overhead, as far as I understand.
Re:Lots of programs do that by reverius · 2001-12-13 15:47 · Score: 2

I believe you were using GetRight wrong.

The way it actually works is that you tell it to search for mirrors (using an ftp search site like ftpsearch.lycos.com) and it finds a bunch.

Over my connection, downloading a linux distro usually goes about 60k/sec from any single site.

Using the mirror searching and segmented downloading in GetRight, i've got 450k/sec from 8 sites at once.

This was with no effort on my part, except for clicking on the file, after I configured GetRight.

Swarmcast by Orasis · 2001-12-13 02:59 · Score: 5, Informative

The "Math" they use is called Forward Error Correction (FEC) and is the same stuff that the Swarmcast distributed download system is based off of (http://sf.net/projects/swarmcast/).

I am the creator of Swarmcast, and we at Onion Networks (http://onionnetworks.com/) already have a product that can provide file transfers just as fast as Digital Fountain, but ours is 3-5.5% more efficient and much much cheaper.

On the open source front, we are working on the next generation of the Swarmcast technology which will combine parallel downloads over HTTP, Multicast IP, and UDP (for tunnelling through NAT). We have started work defining a standard for this system called the "Content-Addressable Web" and hope to see it implemented in everything from Apache to Mozilla.

Please mod this up, people shouldn't be paying $150k for these types of technologies.

good point by HanzoSan · 2001-12-13 02:59 · Score: 1

why arent open source people making new protocals?

--
If you use Linux, please help development of Autopac

.PAR files by BMonger · 2001-12-13 03:04 · Score: 1

Something mildly along the same lines (I believe) are PAR files... you can learn more about them here. It has the ability to reconstruct files that don't exist on your HD. This has recently become quite popular in newsgroups.

http://www.disc-chord.com/smartpar/index.html

A little math... by Cato+the+Elder · 2001-12-13 03:07 · Score: 2

You can only solve a linear system of equations with two unkowns and two equations if the equations are independant. And you can't have more independant equations than you have variables. So your calculation is very optomistic. Imagine if I send you the following data:

X+Y=4
X+2Y=5
2X+4Y=10
2X+2Y=8

With 50% packet loss (as opposed to a 50% chance of loss per packet*) you only get two useful equations 2/3 of the time. Of course, these boxes are expensive and can do a lot more number crunching than just linear algebra. But of course, I wouldn't really have expected a good comparison from a company whose veep of marketing said "FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes" Clearly all those game demo sites on the net need to get with the program.

*Computing the chance of loss per packet gives:

1/16 chance of getting all four, 100% success
3/16 chance of getting 3 of 4, 100% success
8/16 chance of getting 2 of 4, 66.666% success
3/16 chance of getting 1 of 4, 0% success
1/16 chance of getting none, 0% success

or 28/48 is a little more than 4/7, slightly worse odds than if you always got 2 of 4.

Re:A little math... by gowen · 2001-12-13 03:20 · Score: 4, Insightful

your calculation is very optomistic. Imagine if I send you the following data:

X+Y=4
X+2Y=5
2X+4Y=10
2X+2Y=8
You made an astonishingly bad choice of equations. If I send you
X+Y=4
X+2Y=5
X+3Y=6
X+4Y=7
then you may find X=3, Y=1 from *any* pair of equations you recieve.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:A little math... by Anonymous Coward · 2001-12-13 07:13 · Score: 0

But... but... but...

isn't sending just "X=3, Y=1" simpler? It seems to take 1/4 of the bandwith. You can even send it twice...
Re:A little math... by gowen · 2001-12-13 07:21 · Score: 2

isn't sending just "X=3, Y=1" simpler? It seems to take 1/4 of the bandwith. You can even send it twice
Well, yes. But (as I understand, and my maths is better than my networking) you then need ACKs to state each data block has been received (i.e. not a stateless UDP connection).

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:A little math... by Cato+the+Elder · 2001-12-13 08:09 · Score: 1

I deliberately made a poor choice in equations-I was trying to show that you couldn't just pick any four equations. However, I had tried to find a bigger set of independant equations myself and couldn't. Probably because it was 7 AM and I'd been up all night.

However, we're not out of the woods yet--if you use the "chance per packet" calculation of loss, you still have a chance of not getting all the data--a 25% chance, if a quick calculation doesn't mislead me. You can reduce this to an arbitrarily small percentage by sending more linearly independent equations, but that takes more bandwith.
Re:A little math... by Anonymous Coward · 2001-12-13 09:10 · Score: 0

Well, gee. Maybe the protocol will be better at picking the equations than someone who is deliberately making poor choices.

"Look, linux sucks because I can configure it so badly it won't run!"

Asshole.
Re:A little math... by RovingSlug · 2001-12-13 09:40 · Score: 2

isn't sending just "X=3, Y=1" simpler? It seems to take 1/4 of the bandwith. You can even send it twice
Well, yes. But (as I understand, and my maths is better than my networking) you then need ACKs to state each data block has been received (i.e. not a stateless UDP connection).
My understanding is that it's still stateful, just in the sense of the entire transmission rather than each packet. So, rather than being hardset to send those four equations, the sender just knows the process to create them. And, the sender keeps creating and sending until it receives that single ACK ("OK! Enough!" instead of "OK, got that one, next?"). If there's some overflow and the receiver gets a few extra packets, no biggie, it's still cheaper than the time and data that would have been spent handshaking along the way.
Re:A little math... by Anonymous Coward · 2001-12-13 11:49 · Score: 0

"Look, windows sucks because I can configure it so badly it won't run!"
Re:A little math... by Anonymous Coward · 2001-12-13 12:04 · Score: 0

Gee, the point being that you can't just send any n equations, as the parent post implied. It's called worst case analysis. But of course, an uneducated little troll like you wouldn't know that. The fact that you can apparently configure linux so that it will run shows that it has matured into a system that takes no thought to use.
Re:A little math... by gowen · 2001-12-13 22:00 · Score: 2

you still have a [25%] chance of not getting all the data
True, and if you send 'X=3', 'Y=1', 'X=3', 'Y=1' as four packets with 50% loss, you have a 7/16 (~44%) chance of not getting all the data (I think). So thats an improvement (modulo all the implementation/protocol overhead, I guess).

You can reduce this to an arbitrarily small percentage by sending more linearly independent equations, but that takes more bandwith.
Thats true of any algorithm you choose for encoding your data.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.

Tornado codes by srichman · 2001-12-13 03:08 · Score: 5, Informative

This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples.

Actually, they use Tornado codes (or a proprietary update thereof), an erasure code. That is, they use forward error correction to encode streaming data or a software distribution over a single (or multiple) client-independent streams of multicast. After a client grabs enough packets, it can reconstruct the source file.

Re:read then speak, not distributed, not compressi by underpaidISPtech · 2001-12-13 03:08 · Score: 1

Please mod parent up, posters above this are STILL blathering on about distributed data transmission.

Read the article people. It's NOT anything like Kazaa, Morpheus, MojoNation or the like. It's about sending data as a mathematical representation of the data, not the data itself. And you only need a portion of the whole equation to arrive at the "solution". That means less actual packets being transmitted, and less overhead from packet loss and retransmission.

Preemptive error-correction by Kieckerjan · 2001-12-13 03:11 · Score: 1

A company I used to work at experimented with a form of "preemptive" error-correction. They built a system that, instead of waiting for acknowledgemnt of a received packet, would resend a packet if the sender had not yet received an acknowledgement. The trick was that this resent packet would be XOR'ed with other packets (possibly a fresh packet). The probability that a packet was lost along the way was dynamically computed and used to determine which packets should be combined and resent in order to optimize the probability that all packets would be reconstructed. This computation could be done very fast, in a finite state machine, which could be implemented in hardware quite nicely.

--
Being well balanced is overrated. -- John Carmack

ECC by Anonymous Coward · 2001-12-13 03:12 · Score: 0

It's related ECC or Error Correcting Codes, and these are the same little wonders which allow you to drill a small hole to your CD and not lose information (Try it!). There's enough redundancy to reconstruct the original bit image.

RAID by KarmaBlackballed · 2001-12-13 03:12 · Score: 1, Troll

The concept of redundantly spreading information so you can construct the original from less than all the parts is the basis of RAID5. I first heard of RAID5 in the mid 80's. The point is that I see no new concepts at play here. Just the application of existing concepts to move files.

More power to them if they can do it right. Earth shattering innovation? Not at all.

--

--- -- - -
Give me LIBERTY, or give me a check.

Raid 5 Errors...Disks by tonywestonuk · 2001-12-13 03:13 · Score: 3, Insightful

I expect this system is a little like Raid 5, Used on Hard Drives. Eg, 5 disks, 1 goes down, you still have enough data to restore the failed drive.- This seams simalar to been sent 5 Million UDP packets, 1Million get lost on the way, however, you still can piece together perfectly what you wanted from the remaining 4Million.

Re:Raid 5 Errors...Disks by masl · 2001-12-13 06:39 · Score: 1

But in a Raid 5 array, you can lose one hard drive completely without losing data, but you can't lose bits randomly on every disks of the array, like an UDP will drop its packets.

kencast has this one patented by Anonymous Coward · 2001-12-13 03:14 · Score: 0

kencast has a satelite solution that transmits data this way.. its pretty old.
its just that when many parts are missing you wont get spock or something appearing on the beamer platform.
its a one way solution so lost is lost.

its like.. www.kencast.com or something.

Formulae=Faster? by StikyPad · 2001-12-13 03:14 · Score: 1

So instead of sending "10," this new method will save time by transmitting the formula 1+1+1+1+1+1+1+1+1+1?

--
https://www.eff.org/https-everywhere

Legal Implications/Questions by Anonymous Coward · 2001-12-13 03:14 · Score: 0

Has anyone considered the legal ramifications a protocol like this would cause? The whole concept of digital copyright (and of course, the 'beloved' DMCA) is to prevent users from sending copyrighted content to one another. But what if you were to send only the _instructions_ on how to make an exact copy?

Its illegal to make a bomb, but its legal to have instructions to make one (at least here in the U.S.). What do you think?

It's called Forward Error Correction by mprinkey · 2001-12-13 03:16 · Score: 2, Informative

There is nothing "proprietary" here. The techniques for reliable transmission of digital data over unreliable media has been a central area of EE research for at least three decades. The unreliable media is now UDP instead of broadcast RF or transmission lines.

Line reliability in the "normal" sense is classified by bit error rates. Here, the analogous rating would be packet dropped per second. So, it seems like a straightforward application of FEC. It is useful for the reasons that they state. TCP transmissions look like this:

Client: Request Data Packet #00001 Server: Transmit Date Packet #00001 then wait for Client to say OK or Retransmit. Client: Request Data Packet #00002 Server: Transmit Date Packet #00002 then wait for Client to say OK or Retransmit. .... Client: Request Data Packet #20343 Server: Transmit Date Packet #20343 then wait for Client to say OK or Retransmit. DONE
The FEC/UDP form would look like this:
Client: Request Data FILE Server: Transmit FEC Data Packet #00001 Server: Transmit FEC Data Packet #00002 ... Server: Transmit FEC Data Packet #20343 Client: OK...I got it. Stop Transmitting. DONE
There is no per-packet dialog between server and client. This removes network latency from the transfer equation. Note that FEC over UDP does require redundant information to be transmitted, so there is no free lunch here, but it is certainly likely to be faster than any TCP/IP implementation.

Re:It's called Forward Error Correction by ZxCv · 2001-12-13 11:03 · Score: 2

But with TCP, it doesn't usually require per-packet dialog. It starts off per-packet, but then increases exponentially until packets have to be transmitted. So if the first packet goes through fine, it will send 2 before requiring dialog. If those go fine, it will send 4, and so on and so forth.

In practice, the FEC over UDP may be faster than any TCP/IP implementation in certain situations, but when you add in the extra volume of bandwidth (ala this company's implementation), I think it becomes much less attractive.

--

Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;

FEC Library by Orasis · 2001-12-13 03:17 · Score: 5, Informative

Oh yeah, and you can download our completely patent-free and open source FEC library from here and build your own Multicast or UDP based download system very quickly (provided you get the flow control right :)

--
Justin Chapweske, Onion Networks

Some relevant background information by adadun · 2001-12-13 03:18 · Score: 1

For those of you who don't know it, this stuff is based on some very serious research performed at the University of California in Berkeley (yes, the same place that spouted the famous BSD system) together with Digital Systems research center in Palo Alto. It was published at one of most well-renowned networking conferences, SIGCOMM, in 1998. Here is a link which provides not only the paper, but also slides from the presentation.

The digital fountain approach is not a way to transmit information without transmitting it as the brurb suggests. Rather, it is an ingenious way of using forward error correction (so-called erasure codes) and broadcast (or multicast for that matter) to distribute data to a large amount of receivers.

In short, each data packet includes enough reduntant information to allow a receiver that has lost a few packets to get back in sync after receiving a number of the broadcasted packets. This way, the sender does not need to do any retransmissions; losses are repaired by the new packets that is sent out.

One place where this kind of technique could be used would be when a new, large, software package such as KDE, GNOME, or the Linux kernel, is to be distributed to a large number of receivers. The sender would just send data in a fixed rate and the receivers would just have to "tune in" to the data stream. After some time, the whole package is received. No more spikes in bandwidth consumption and no more slashdot effects.

Multicast by srichman · 2001-12-13 03:20 · Score: 4, Interesting

Quite correct. This protocol does not sound at all TCP friendly [yahoo.com]. It needs some way of dynamically responding to network conditions to be that way.

Well, "this protocol" is just multicast (from a network perspective). Though there have been research attempts to introduce pruning in multicast streams, it is inherent a non-flow controllable transmission protocol. If you take offense to the lack of "TCP friendliness" in multicast, then I suggest you complain to Steve Deering, not Digital Fountain.

Multicast is congestion friendly in that it can inherently seriously reduce bandwidth consumption, but it's obviously not congestion adaptive. I think the easiest (and probably best) way to introduce congestion control in a multicast is to have multiple streams at different rates, e.g., you have five simultaneous multicast webcasts of a Victoria Secret show, and folks can join the mulitcast group with the video quality that best suits the bandwidth and congestion on their pipes. This idea works very well for the Digital Fountain-style software distribution: rather than having a server serving the same stream at 10 different rates, you can have 10 servers streaming their own unsynchronized Tornado encodings, and clients can subscribe to however many their pipes can support. With 10 streams of data coming at you, you can reconstruct the original data ~10 times as fast.

Re:Multicast by BeBoxer · 2001-12-13 06:01 · Score: 2

Yeah, you are right. This protocol would be perfect for multicast! People have been working on developing "reliable" multicast transmissions of data forever. If you try to do it with the usual TCP-style transfer it's an amazingly hard problem. Especially since one of the standard "requirements" of multicast is that senders don't have to know anything about the receivers! This limitation alone makes reliable data transfers almost impossible within the boundaries of standard multicast.

Until this came along. All a sender needs to do is pump out the data continuously on a given group address at some network-friendly rate. Any receiver who wants the data just connects and slurps down data until it has the whole file. Then disconnect from the group and the routers prune the traffic off of your network. It's genius. Each listener only has to receive traffic for exactly as long as they need to get the whole file! Since the overhead is claimed to be only 5% or so, it's hard to imagine any other scheme being more efficient.

On the other hand, I don't see much benefit for unicast connections. If their idea is to use their algorithms to allow them to send data faster than TCP without flowcontrol, they are going to piss off a lot of network managers. Their traffic is going to look an awful lot like a DoS attack. In fact, without flowcontrol in a lot of cases it will be a DoS attack. The article gives the case of somebody with a 32Mbps connection who only gets 0.5Mbps out of TCP. Well, if they decide "hey, I'll use 1/3 of my connection for this traffic" and start dumping 10Mbps into a network that was already bottlenecked, they can expect a call from a very upset network person who wants to know why they are melting down their network.

There are also some lies about TCP in the article. TCP does not require packets to arrive in sequence. Well, a few old brain-dead stacks did. But no OS made in the last few years cares that much about out of order packets. If you have a high bandwidth connection but you can't seem to get much thruput one of two things is happening: you have buffers that are too small on one or both ends of the connections, or the network is really too congested to go much faster. In the first case, there are numerous fixes if you take the time to research them (although I'll admit it's a pain in most cases.) Here's a search for more info. The article even implies that this is the case they are trying to fix. "They had too much turnaround". Hmm. Sounds like latency. 32Mbps * (your round trip time) = suggested buffer size. You don't need to spend 100K on hardware, just tweak your TCP stack! Or run a 2.4 kernel which will do a pretty darn good job of autotuning your buffers on the fly.
Re:Multicast by Anonymous Coward · 2001-12-13 09:02 · Score: 0

If you had any real experience with high-speed transmission over transcontinental links, you'd know that the packet loss rate inevitably kills you with TCP. Don't think out of order reception, think dropped packets. Now you don't need to retransmit them anymore.

Understand now?
Re:Multicast by BeBoxer · 2001-12-13 13:22 · Score: 2

No. Packet loss happens because the link is congested. TCP slows down because the link is congested. Blasting traffic onto a congested link without flowcontrol does not magically make the link less congested. It makes it more congested. Their algorithms don't actually avoid retransmissions. They just rename them. There isn't any fancy math that can help you avoid packets getting dropped.

Let's take an extreme example of 50% packet loss. Let's also say that the data would fit in 100 packets. With TCP, the sender will have to send about 200 packets in order to get the 100 needed packets thru to the receiver. The same is true of the 'fountain' method. Just because the 200 packets are all different doesn't mean that the sender doesnt need to the send packets. Which is what counts.

Now, with their method they might send the 200 packets faster than TCP would. But in the larger picture, that just screws everybody else. Look at the white papers on their site describing their technology. They give the example of somebody who has T3 (45Mbps) links, but the network in between them is suffering from 200ms latency and 2% packet loss (which is a shitload by the way). TCP, of course, runs slowly on such a network. As it should. They talk about using their tech to just firehose UDP packets out at 45Mbps, and hey! Your file will get thru in seconds instead of hourse! Which of course is BS. If you start slamming 45Mbps of un-flow-controlled traffic onto a congested link you will probably take it completely down. If you keep it up, you can expect somebody to eventually start filtering your traffic out because you are DoS'ing the network. At best. At worst, you can expect a visit from the FBI because your behavior is exactly the same as a script-kidding DoS'ing somebody, only you won't be bothering to forge your source address so you'll be easy to find.

Don't get me wrong, their technology has some wonderful applications. But intercontinental traffic is sometimes slow because the links are congested! No fancy math is going to change that. Ignoring the fact that the link is full and blasting traffic into it without flowcontrol will only make it worse! They might think they are on to something really cool, but if it works at all it will only work for one person at everyone elses expense. If everyone stops using flow control and just blasts traffic out as fast as they can, performance will really start sucking.

All of which they realize, which is why they are working on RFC's for flow control. And when those RFC's get approved, you can bet that the congestion backoff will look a lot like TCP because we have a pretty good idea of how TCP behaves on a macro scale. The whole spin of getting super fast network transfers is basically marketing BS, something which I'm sure the developers are aware of. The real advantages, which are the ones they are pursuing in RFC's, are reliable multicast and (almost) stateless data serving.

All about Digital Fountain by TheSync · 2001-12-13 03:21 · Score: 5, Informative

OK folks, here is the "real deal."

Digital Fountain's core technology is called "meta-content (tm)". The meta-content engine produces packets based on the original content, such that if the receiver receives enough packets (just slightly more than the size of the original content), the original content can be recreated. The neat part is that it doesn't matter which meta-content packets you get. If you need to receive 10 packets, you can get 5, miss 5, get another 5, and it works. Or you can get 1, miss 10, get 9, and it works as well. As long as you receive some 10 packets from the "fountain," you can recreate the original content.

Why is this cool? Several reasons. Digital Fountain claims that TCP connections with RTT of 200ms and 2% packet loss have a bandwidth limitation of 300kbps, no matter how fast the actual transport channel is. So you just go to town to full bandwidth with UDP to use up the entire channel, and use Digital Fountain technology so it doesn't matter which 2% of packets get lost, just as long as you transmit enough packets to make up for the packet loss.

OK, why else is this cool? Imagine a Digital Fountain continuously transmitting meta-data on a multicast address. If you want to receive a file, just listen to the multicast address. It doesn't matter when you start listening, just as long as you listen for enough packets to recreate the original file. Multicast file distribution.

Interestingly enough, Digital Fountain has also pioneered multicast on-demand streaming, but the real secret sauce there is something besides meta-content, but meta-content makes it easier.

As some people have mentioned, you can use UDP with FEC to achieve some error correction. But meta-content can handle long burst errors, whereas FEC is only appropriate for short, random errors. You can literally unplug the ethernet, wait a while, and plug it back in, and you're still good to go with Digital Fountain, as long as you listen long enough.

I should mention, DF has something called "Fair Layered Increase Decrease Protocol," or FLID, to keep their UDP burst from blowing away other TCP traffic on the network.

For more information on the basic Digital Fountain technology, see: A Digital Fountain Approach to Reliable Distribution of Bulk Data.

Re:All about Digital Fountain by mikeee · 2001-12-13 03:42 · Score: 3, Insightful

Digital Fountain claims that TCP connections with RTT of 200ms and 2% packet loss have a bandwidth limitation of 300kbps, no matter how fast the actual transport channel is.

First, 2% packet loss is terrible, even on a WAN.

Second, 200ms is terrible latency, unless you're crossing an ocean.

Neglecting packet loss (which ought to be neglectable, though admittedly isn't at 2%), your maximum TCP throughput will be (TCP Window Size)/(2 * Latency), or the bandwidth, whichever is more. That comes to about 1280kbps on that 200ms link if you aren't using TCP window scaling options, and much higher if you do.
Re:All about Digital Fountain by hqm · 2001-12-13 03:54 · Score: 2, Informative

You can easily use FEC for burst errors, you just use a cross-interleaved encoding on top of the basic encoding. That's what CD players do and they can eat a 4000 bit long burst error without a hiccup. And CD-ROM encodes another layer on top of CD Audio, and thus can sustain even larger burst errors.
Re:All about Digital Fountain by BlueUnderwear · 2001-12-13 03:55 · Score: 3, Insightful

As some people have mentioned, you can use UDP with FEC to achieve some error correction. But meta-content can handle long burst errors, whereas FEC is only appropriate for short, random errors.
This depends on the parameters of your FEC algorithm. Most FEC algorithm do indeed divide the data to be transmitted into "slices" of n blocks, to which they add k blocks. If more than k blocks are lost per slice, you're SOL, even if enough extra blocks are available in other slices. However, there is a way around that: just tune your FEC algorithm so as to make n large enough that all of your file fits into one slice.
The downside of this is that the larger the slice is, the more computationnally intensive it is. If the only thing you're concerned about are burst errors, just interleave your slices.
You can literally unplug the ethernet, wait a while, and plug it back in, and you're still good to go with Digital Fountain, as long as you listen long enough.
This is possible with run-of-the-mill FEC algorithms as well, as long as you put your entire file into a single giant slice.

--
Say no to software patents.
Re:All about Digital Fountain by Arlet · 2001-12-13 04:41 · Score: 2

2% packet loss is not terrible if you consider that most packet loss on WAN is caused by routers dropping frames because of congestions. TCP uses knowledge from these dropped frames to determine this congestion, and basically reduce its throughput to match the link's bottleneck. 200 ms latency isn't great, but it probably assumes crossing the ocean.

Now, assuming the dropped packets are due to congestion at a bottleneck, there's no reason to assume that UDP+FEC can seriously increase bandwith. If people all start to pump UDP packets at T1 speeds into a WAN, serious congestion will occur, and some router is going to drop the majority of the packets.

Of course, if only a single person is doing it, bandwith can be improved, but I bet you can achieve similar results by removing congestion control from TCP, and using big windows.
Re:All about Digital Fountain by Anonymous Coward · 2001-12-13 08:15 · Score: 0

Theoretically, bandwidth has no practical limit. Latency is limited by the speed of light.
Higher bandwidth/lower latency is easier if you can accept more errors in transmission.

NOT NEWS. by tqbf · 2001-12-13 03:23 · Score: 5, Informative

Digital Fountain has been around, with product, for a long time. The technique they are building on for file transfer, has been around even longer. Their protocols are even IETF drafts/standards now.

The concept of "sending equations" instead of data is extremely well-known. It's called Forward Error Correction (FEC). FEC is a very simple idea: take the source data and encode it with parity data so that you can reconstruct the original message from any N chunks of it. One form of FEC that even your stereo components might already do is Reed-Solomon encoding; you can look this up in CS textbooks. If you Google for "Luigi Rizzo Vandermonde", you'll find a fast, free C library for FEC that you can use in your own applications.

Digital Fountain was started by Mike Luby, who is something of a luminary in information theory/cryptography. The kernel of their company's IP is "tornado codes", an FEC codec that is leaner and faster than competing codecs. The basis for their protocols, last time I checked, is FLID/DL and RLC. These protocols set up multiple channels (or "sources" if you must), each transmitting random chunks of FEC-encoded files.

The drawbacks to FEC are that it can take a long time to encode data for transfer, which makes FEC hard for some real-time applications like mail, and that the amount of data transferred is going to be some percentage greater than the original file (owing to parity information). But the drawback to FEC file transfers protocols is much more significant: they aren't TCP-friendly.

The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?

More to the point, what's the benefit? There's obvious payoff for FEC in multicast, where backtraffic from ACKs and NAKs quickly overwhelms the group and kills performance. But in unicast-world, where we will all be living for the next decade, TCP and TCP-friendly forward-reliable protocols like SCTP already provide good performance.

Slow news week at EETimes I guess. Or Digital Fountain just hired a PR firm.

Re:NOT NEWS. by Scooby+Snacks · 2001-12-13 04:01 · Score: 1

One form of FEC that even your stereo components might already do is Reed-Solomon encoding; you can look this up in CS textbooks.
Reed-Solomon encoding has begun to be used on Usenet in the form or parity, or .par, files. The idea being that you create x parity files for a multipart post and post them along with the actual data files. Then if you get an incomplete or corrupted file while downloading, you simply download a parity file for each file you're missing. There are at least two different implementations for Windows, and a GPL'd implementation available. In the source tarball for parchive, there is a text file (called rs.doc) that explains the mathematics behind Reed-Solomon encoding as well as C implementation details.
For those of us that don't have a CS textbook handy. ;-)

--

--
Runnin' around, robbin' banks all whacked on the Scooby Snacks...
Re:NOT NEWS. by adadun · 2001-12-13 04:10 · Score: 2, Informative

The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?

Yes, their protocol uses TCP-friendly congestion control - read about it in their SIGCOMM paper.
Re:NOT NEWS. by msouth · 2001-12-13 09:48 · Score: 2

well, ok, but it was news to me. :)

--
Liberty uber alles.
Re:NOT NEWS. by tqbf · 2001-12-13 09:54 · Score: 2

The irritating thing is that scaleable FEC schemes have been used for reliable multicast for a long, long time, but Digital Fountain (and to a lesser extent Swarmcast) are gathering accolades for what is basically a simple refinement of a very well-known technique.
Re:NOT NEWS. by Anonymous Coward · 2001-12-13 10:02 · Score: 0

yep. par files rule. makes dling pr0n a piece o cake!
Re:NOT NEWS. by msouth · 2001-12-13 10:28 · Score: 2

But haven't you heard? "gathering accolades for what is basically a simple refinement of a very well-known technique" is all the rage. Except they usually spell "accolade" p-a-t-e-n-t.

Wasn't criticizing your post, btw, just taking the opportunity to be funny.

mike

--
Liberty uber alles.

Droppage by srichman · 2001-12-13 03:27 · Score: 2

You still need some form of flow control or rate limiting, otherwise a large percentage of the UDP packets are going to get dropped.

The whole point of Digital Fountain's erasure encoding scheme is that it doesn't care how many packets get dropped on the floor; a client can reconstruct the original data just as efficiently with whatever packets it receives as if the stream had been sent at an optimal rate to begin with.

Plus, you have the problem of UDP streams stealing bandwidth from TCP streams on a limited bandwidth link.

This is a problem. You want to satisfy the people with the fat pipes, but not destroy all the little pipes and hose people's TCP windows. See my other post for a possible solution.

from P2P to P2net? by gotan · 2001-12-13 03:35 · Score: 2

Yup, reading a few answers here i'm already reading up on FEC (why didn't the EETimes mention this?). The nifty thing about this is, that it's also a nice way to store information in a distributed way (think freenet, gnutella, ...) as far as i understand it. This opens a whole new perspective on the whole P2P theme, since a piece of information wouldn't be stored on one host alone, and also because of the implied redundancy, meaning that the information would still survive in that network until a critical number of hosts would be disconnected. Also netload would be distributed on the sending end, opening some interesting ways for congestion control.

--
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks

Re:from P2P to P2net? by wjr · 2001-12-13 07:03 · Score: 1

Look at Mojonation for an implementation of exactly what you just described.
Re:from P2P to P2net? by Anonymous Coward · 2001-12-13 09:27 · Score: 0

It would be cool if a huge number of clients only stored one part of the 'recepie'. If it was say an image of WinXP. Nobody had a copy of XP, but when combined, it 'somehow' transformed to a WinXP image.

Now that would make BSA and RIAA happy. :)

Swarmcast by Anonymous Coward · 2001-12-13 03:36 · Score: 0

That company that was giving out the "Open Source cola" at all the shows last year does something to the effect of "Swarming" for data.

Read here

http://www.opencola.com/products/4_swarmcast/tec hn ology.shtml

Simple working example by claytronics · 2001-12-13 03:40 · Score: 1

Here is a simple example that works the way the article describes. Suppose the original data can be broken into 3 equal length packets A, B, and C.

Send the following 4 packets (where # represents XOR):

A#B

B#C

C

A

Note that ANY three of the four packets can be used to recontruct A, B and C! You just have to XOR them appropriately. It is a bit like solving a linear equation. I haven't tried to extend it to more packets, but I will fiddle with it more tonight.

Re:Simple working example by Junta · 2001-12-13 03:48 · Score: 2

And the process you describe does nothing for showing benefit. Look, we can represent the three original packets using three different packets is a rather pointless endeavor, especially sending all four seems really silly. If you scale that principle at all, pure XORing doesn't work to save any bandwidth. You need a better example to illustrate how this would actually be useful. I don't think you properly understand what the article is getting at...

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Simple working example by claytronics · 2001-12-13 04:11 · Score: 1

It works! The usefulness is that I can send the four packets WITHOUT error correction to ensure that they actually arrived. The listener doesn't have to say: "Whoops, I missed packet 2, please resend it", it just has to keep listening. Packets 1, 3, and 4 are sufficient to reconstruct the original message.
If the listener successfully received packets 1,2 and 3, it would signal the sender to stop. Packet 4 wouldn't be needed.
The point is that the communication back and forth to resend missing packets takes time and adds latency. This scheme (obviously extended to more than 3 packets) allows the sender to keep sending at full speed, and to not worry about dropped packets. The receiver just listens until it receives the expected number of packets, and no matter which packets were dropped (whether it received 123, 134, 234, or 124) it can reconstruct the entire message!
It is tremendously useful in improving throughput, esp. on connections will large latency. For example, if I were communicating with Mars, I wouldn't want to have to go back and say: "Please resend packet #434" as it would add a 6 minute latency!! I can just keep listening for ONE extra packet, and I will have the entire message.
I hope that clears up my example above.
Re:Simple working example by Junta · 2001-12-13 04:25 · Score: 2

Yes, but that doesn't work well in practice. One, by sending an extra packet, you're wasting bandwidth. Issuing an acknowledgement that the first three packets have been received would likely be too late to stop transmission of the fourth packet, otherwise, your complaint about negative acknowledgement being too slow becomes moot.
For another, a large portion of transmission errors are not localized to a single packet, thus providing for a single packet to be in error (parity scheme) is essentially useless, as you'd probably need to send a negative acknowledgement anyway.

--
XML is like violence. If it doesn't solve the problem, use more.

lineary independent equations by zby · 2001-12-13 03:41 · Score: 1

Somebody has already posted some example.
x + y = 4
x + 0y = 4
2x + y = 8
0x + y = 0
You can take any two of these equations to be able
to extract the solution x = 4, y = 0.

Read the article by underpaidISPtech · 2001-12-13 03:42 · Score: 1

No mention of UDP in the article.
No mention of distributed data in the article.
No mention of compression in the article.

Hmm, perhaps some sheeple*cough*people got lost on the way. Here is the link again. Read it over.

Take a chunk of data, reduce it to an equation. Break equation into symbols. Send. Receive a percentage of symbols. Perform XOR on symbols received. Rebuild equation, solve equation, recreate data. Stop send, stop receive. Thank you for using StarTrek FTP server, goodbye.

As the cost of the boxes are so high, they will probably only be used by backbones. I was hoping this would be a software only solution. The we would really see Digital Convergence. Hell, our processors are fast enough...

One application that combines multiple sources: by JoeGee · 2001-12-13 03:44 · Score: 1

Kazaa's file sharing client, also used by MusicCity, combines multiple data sources to maximize the requesting client's use of bandwidth.

--

Get off my virtual lawn, you damned virtual kids!

It's called Reed-Solomon Encoding by hqm · 2001-12-13 03:47 · Score: 1

One of the most well known forward error correction algorithms is the Reed-Solomon coding, developed in the 1960's. It is the encoding used for CD Audio and CD ROM discs, as well as being widely used in disk drive controllers.

I have a GPL implementation at http://rscode.sourceforge.net/

Phil Karn (KA9Q) also has an even better implementation available for download someplace.

I had the idea, along with many other people, of distributed filesystems using FEC encoding. My original idea was to spread files over a large number of anonymous FTP servers, and then "harvest" them later. I also was interested in using FEC for UDP audio streams, back in 1994 or so. Anyway, this is old news, and I would be alarmed if any patents were granted on the idea of using error correction coding to transfer files.

Clever Algorithm... by Schnake · 2001-12-13 03:52 · Score: 1

Well, break up the large file into 1K chunks... Send the MD5 hash for each chunk, and then apply a lossy compression algorithm on the data chunk, sending that across too...

The other side uses rebuilds the lossy data chunk, and from that point, you've reduced the search space needed to find original data, using brute force reverse MD5 hashing.

Ofcourse, you'd need to find the optimal values for the data chunk size, and whatever lossy algorithm you choose... But I have a good feeling this will work pretty well!

Then you'll see people beefing up their computers to increase download speeds... Probably even revive chip sales, and push AMD and Intel to develop faster processors. Maybe nVidia will come out with the DPU - Download Processing Unit. ;-)

Pimping your product on Slashdot by Ratbert42 · 2001-12-13 03:53 · Score: 1

Nowhere in the article does it mention UDP. We know michael never reads the articles. So where did UDP come from? Is Wolfger a DigitalFountain plant? Doesn't look that way, but it's possible.

Yes and there is a NICE linux client by Anonymous Coward · 2001-12-13 04:10 · Score: 0

eDonky2000 does indeed grab from many sources at one time. There is a nice linux client for it as well. The windows client has more fancy graphics, but eDonkey's client is lightyears ahead of kza

Not quite like Kazaa by Anonymous Coward · 2001-12-13 04:10 · Score: 1, Informative

If I am not mistaken, they are using FECs to transmit data to get over potential latencies incurred by waiting for TCP ACKs and packet ordering. In this way they worry less about dropped packets since any of the FEC packets can help reconstruct lost bits. In addition, if you kick off transfers from multiple sources, you never have to worry about receiving packets in order, etc...

Luby Transform Better for Giant Blocks - So What? by randyjparker · 2001-12-13 04:20 · Score: 1

Related techniques: Swarmcast, Typhoon, rmt, Fcast, mgpm, onionnetworks

Michael Luby and Amin Shokrollahi are CTO and Chief Scientist of Digital Fountain. They are a couple genuine experts on FEC (foward error correction). They have published academic papers on this topic. While at Berkeley's International Computer Science Institute, Processor Luby wrote "Benchmark Comparisons of Erasure Codes" (http://www.icsi.berkeley.edu/~luby ), and Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed Up Downloads", "A Digital Fountain Approach to Reliable Data Distribution of Bulk Data" (a slideshow pitch - these last two are on digitalfountain.com/technology/library/papers ). Luby also wrote "The Use of Forward Error Correction in Reliable Multicast" (an Internet Engineering Task Force draft).

See also "Fcast multicast file distribution: Tune in, download, and drop out" from Jim Gemmell and Jim Gray of Microsoft, published in Dr. Dobbs.

Professor Luigi Rizzo from Universita a Pisa (poke around http://www.iet.unipi.it/~luigi )wrote a whole bunch of papers on Vandermonde matrix codes, including a downloadable sample program. Rizzo's work is the guts of several open-source FEC projects (most of which seem to have dried up).

All three of these guys (Luby, Rizzo, Gemmell) co-wrote "Asynchronous Layered Coding: A massively scalable reliable content delivery protocol" (another IETF draft from 7/20/2001).

My opinion is the Luby et al have improved Rizzo's FEC techniques by application of a great deal of mathematics. The Digital Fountain "LT" (Luby Transform) codes are (modest?) improvements on Tornado Codes. The Tornado Codes require less CPU than Rizzo's Vandermonde codes for gigantic erasure-code block sizes. But so what? I don't know what's so wrong with 1-k packet sizes. If the much simpler Reed-Soloman math serves well for modest block sizes, why not use modest block sizes? Complexity matters! Why write complicated code unless you really need something it alone can do? A professor may create a more optimal solution, but there are lots of examples of simple approaches that are more than good enough.

I do believe that FEC provides benefits far beyond what can be done with ftp. The issue of ack-latency and sliding window sizes is completely eliminated. You may transmit 50-300% more bytes to get the content, but at full network speed every time! And if you are transmitting the file to multiple receivers, they can join a multicast group and get the same stream without concern with where in the content they begin. As long as they get a certain fraction of the transmission, they can compute the original content. This means that tons of "dropped packets" and "data loss" can be tolerated without concern for data integrity. Specific data relating content expansion (50%? 200%?) vs data loss (bit error rate, correlated / uncorrelated packet loss), and CPU consumed by the algorithm **for code blocks of, say 100k-500k** is something I'm still trying to locate. Several of the big players try to obscure these comparisons for commercial reasons, and I don't blame them. They've spent years developing their algorithms. On the other hand, if their approaches are really much better than the simpler techniques, they have not convincingly demonstrated it. Or is there some advantage in choosing erasure code blocks of many megabytes? Why not just use Unix "split" on the file before encoding? Maybe that's what some of these other guys actually do, and maybe it works just fine.

The algorithm complexity hierarchy looks like:

1) XOR
2) Vandermonde-based RSE (Reed Soloman Erasure)
3) Cauchy-based RSE
4) Tornado
5) LT

I was a skeptic... by DaoudaW · 2001-12-13 04:21 · Score: 5, Informative

For example, consider the transfer of a 1 GB file with an average Internet packet loss rate (2%) and global average RTT (just over 200 msec). For this case TCP has an inherent bandwidth limitation of 300 Kbps regardless of link speeds and thus will take more than 7 hours to transfer the file. A Digital Fountain server sending Meta-Content allows almost complete bandwidth utilization. With Digital Fountain, on a fully utilized 1.5 Mbps (T1) line the transfer of the 1 GB file will take about 1.5 hours and on a fully utilized 45 Mbps (T3) line the transfer time will be a little over 3 minutes.

I was a Skeptic, but I'm now that I've read the Digital Fountain white papers: Meta-Content Technology White Paper and TCP White Paper, I'm a True Believer.

I don't pretend to understand all the details, but the technology is based on the Luby transform which was invented by the company's founder. Essentially the content is broken into segments which are then randomly XORed to produce the metacontent. This provides redundancy since each segment of metacontent contains information from several of the original segments. Also the metacontent segments do not require sequential transmission.

I use one @ home by HackHackBoom · 2001-12-13 04:24 · Score: 1

Called Accelerator Pro, the cheap (Read: Free) version will download the same file from 8 different sites to speed things up. Works great. My slow-ass T1 (Sigh, only 160K/Second downloads) is too slow when I'm moving my (cough) 500+ meg 'recreational entertainment' files to and from 'casual onlookers' isn't enough for me anymore.

(LOL)

--

"It's not stealing if you don't get caught!"

heres a better article by paulydavis · 2001-12-13 04:25 · Score: 1

http://planetanalog.com/news/OEG20010413S0034

Consumer spending by Gerbil912 · 2001-12-13 04:30 · Score: 0, Offtopic

The whole problem with Microsoft is simply they want people to spend more and more on THEM. Yah, just about every business or corporation is driven by consumer spending, but when they go to the lengths to eliminate competition and restrict consumer privileges it gets totally out of hand. Restricting consumer privileges is what this whole issue is about, right? Microsoft wants consumers to spend more on their products by eliminating the option for consumers to spend on a competing product; illegal and unorthodox business practices in the US the last time I checked. Considering Microsoft has made a clone to just about every popular software product, this is obvious (I even hear in XP they threw out Java and are replacing it with C#). Like Microsoft, the music industry too would rather restrict peoples' use of mp3s--ethically or not--and see them buying CDs and such. Ultimately, Microsoft is a monopoly bent on eliminating any competition in order to increase its own revenue, it's using unfair and illegal business practices, and the computer industry would be better if there was an equal competitor.

Anyopne remember FSP by krynos · 2001-12-13 04:32 · Score: 1

That was a simpler file transfer similar to FTP, but based on UDP for faster file transfert.
In it's time most of the sites using it where for trading copyrighted software (warez).
After 1995, I've not seen much of FSP.

this is a piece of hardware they're selling?! by nerp · 2001-12-13 04:39 · Score: 1

Strikes me as kind of odd that they need to make a piece of hardware to do this, especially when they say that it "relies instead on the computational power of standard PC processors" and "The translation is straightforward enough to be handled by a Pentium III processor" ... so why not just sell the software to do this? I suppose then they wouldn't get $70,000 a copy for it.
...
ok, I think I'm answering my own question (partially) .. according to their website the "box" they keep referring to in the article is a special server (1.5Gbps and 5 or 85 GB of storage) with their software on it. And you don't need two of these boxes for a complete connection. The server comes with a Fountain Client program to receive transfers. And they sell an SDK to make your own clients as well.

seems like they should've mentioned some of this in the article

Already very common on Usenet, large binaries by the+grace+of+R'hllor · 2001-12-13 04:41 · Score: 1

Usenet files are often distributed with a collection of 'par files'. These files consist of any number of files with an index-like file (*.PAR), and datafiles (*.P01, .P02, etc).

I'm not nearly savvy enough in this area to comment on internals.

Anyway, basically you can replace any number of files smaller than or equal to the number of PAR files. So if you get PAR files P01 through P04, you can replace up to 4 files that have been damaged.

This is cool stuff, since it drastically reduces terminally incomplete large binaries on Usenet servers. Which, of course, I.. uhm... have no part of, and stuff.

Check out an open source client of this thing: http://parchive.sourceforge.net/

--
(does the term Reed-Solomon ring a bell?)

These guys are idiots. by Anonymous Coward · 2001-12-13 04:45 · Score: 0

"FedEx is a hell of a lot more reliable than FTP when you're running 20 Mbytes," said Charlie Oppenheimer, vice president of marketing at Digital Fountain.

Moron. In the banking industry we've been doing FTP transfers of several hundred megabytes every night for years on 486 processors. It's reliable as all hell and fast too.

If you are concerned about bandwidth, compress the data first. That's pretty much what they are doing with their "algorithms" anyway.

Mod parent up. This is what it's all about. by moogla · 2001-12-13 04:45 · Score: 1

It's cool, they transmit packets that contain not only internal error correction information, but information about packets before and after (in decreasing degrees per level of checkbits). By watching the stream for long enough, you can get enough extra information to correct any string of bits (at the expense of decoder memory). This is why they were talking about XOR in the article. There's a lot of stuff about sparse bipartite graphs and modeling the required corrrection code transmission rates with differntial equations in the Tornado codes paper, so this isn't Mickey Mouse technology.

--
Black holes are where the Matrix raised SIGFPE

a clever ploy ...(Re:Link to product) by c0rtez · 2001-12-13 04:47 · Score: 1

This was all just a clever ploy to wring 8 karma points out of one post, wasnt it?
:)

Re:a clever ploy ...(Re:Link to product) by Omnifarious · 2001-12-13 07:05 · Score: 1

Nah, I'm at the cap. :-) Been there for a couple of years now. :-)

--
Need a Python, C++, Unix, Linux develop

Re:Vectors... (Overrated? Puhleeze!) (OT, sorry) by Anonymous Coward · 2001-12-13 04:50 · Score: 0

Uhoh, moderators on crack...

Guess I'll have to break out my red M2 pen.

--
Posting anonymous to protect the innocent (and my karma)

Damn, fluff by bigox · 2001-12-13 04:59 · Score: 1

Is it just me or are all EE Times articles that fluffy? Do I get static cling protection with that?

Morpheus? by mcrbids · 2001-12-13 05:11 · Score: 2

Morpheus allows for downloads from simultaneous sources.

In fact, I've seen my system upload files that I was still downloading!

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Secret sharing/Erasure code by ge · 2001-12-13 05:19 · Score: 1

Some people have compared erasure codes to secret sharing schemes. Secret sharing schemes like Shamir's scheme create shares that are as large as the original data. The good ones have the property that no information is leaked if the amount of shares you collect is below a certain treshold .

Erasure codes create 'shares' that are smaller than the original data. The ideal erasure code creates shares that are just the size of the original data divided by the treshold value. They also do not necessarily have the property that no information is leaked when you don't have enough shares.

Michael Rabin's Information Dispersal Algorithm is such an ideal erasure code. It's just too slow when you create a large number of shares.

As an aside: by combining secret sharing, erasure codes, and an encryption algorithm you can build a hybrid secret sharing scheme that generates small shares and is computationally secure.

edonkey by Anonymous Coward · 2001-12-13 05:20 · Score: 0

I'm surprised nobody's mentioned the donkey which uses MFTP similar to GetRight.

I had a project to do this two years ago... by _avs_007 · 2001-12-13 05:32 · Score: 1

It was faster than TCP based stuff, but it was a biatch to manage flow control and congestion control.

My take on the situation by athmanb · 2001-12-13 05:33 · Score: 2

When you remove all the marketing talk from that article and the company website, what remains is essentially a protocol which:
a) Uses UDP instead of TCP and implements its own proprietary flow control by sending the data multiple times
b) Has no streaming capabilities whatsoever.

Also, it's pretty evident that no matter how nifty their algorithm is, the data which needs to be transmitted before the file can be reconstructed needs to be at least as big as the size of the original file. Quite probably a lot more because of redundancy since the receiver can't acknowledge data.

I second the originals' posters feelings about the protocol...

They mail a ruler... by Christopher+B.+Brown · 2001-12-13 05:35 · Score: 2

I believe this is the scenario where they take the information, compress it, thus achieving a Very Large binary number, let's call it X.

They then find the power of 2 that is just larger than that. That's Y which is equal to 2 ^ N

They then take a ruler, 1 foot long, and record on it two pieces of information:

The power of 2 that they're using, N
They then place a mark at the appropriate location to indicate where the location X / Y is, in feet.

Insert ruler into padded envelope, and mail...

Or was this just from some old science fiction story???

--
If you're not part of the solution, you're part of the precipitate.

Isn't this what PAR is all about? by freeweed · 2001-12-13 05:39 · Score: 2

Maybe I'm missing my mark here, but with multipart rar archives, this new PAR feature sounds just like what you're describing. As long as you snag the correct number of files, it doesn't matter which ones you get.. you can recreate the source.

Or am I completely off my rocker here?

--
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.

It seems impossible by ianxm · 2001-12-13 05:40 · Score: 1

I asked this before, but I think I was unclear.

They say that it doens't matter which packets are received, as long as there are enough of them.

What if you need to get 10 packets to transfer a file, but only 1 out of each 10 sent get through.. so you receive 10 identical packets. Can you rebuild the original file from 10 identical packets?

If you can't, then they how can they claim that they don't care which packets get through.
If they can, then why would they bother sending more than one packet?

I've a better way! by Anonymous Coward · 2001-12-13 05:52 · Score: 0

"A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..."

A bulk of 100,000,000 keyboard trained monkeys.
Then obtaining the original file is just a matter of time.

"New" method of increasing downloads my a__ by Hyped01 · 2001-12-13 05:53 · Score: 1

The program or company that (I know of) offered a program to do this is called Download Accelerator.

It works when the following criteria exists
(1) You are not already maxing out your connection.
(2) The server is limiting your bandwidth
(happens if it load balances or just limits each connection)
(3) (or as opposed to #2) there are more than one server in their server pool.

What the program does is set an offset and connect to the same server (assuming there is only one web server to request from) or from multiple servers, and retrieve different portions of the file.

Thus, if you were not at your max bandwidth downloading normally, you will get closer to it. Simple math. If there are 100 users on a server that can support 100Mb/s, you are getting 1Mb/s. If you spawn 10 connections, you get 9Mb/s (roughly). IF you are at your max bandwidth though, or the server you are connecting to is sending as fast as possible, you are actually increasing the amount of time it will take to download the file.

Also, these programs (to date, the ones we have experienced on our servers) do not take into account a single web server that is not limiting bandwidth, and in not doing so, consume the resources of multiple users for single files. Too many people using them, and it becomes a web administrators' nightmare. While on a single page, 1-4 connections will generally serve a client's requests, you now have sometimes many times that for a single file.

Additionally, as well as server resources being comsumed as much as 20 times higher than normal for a single user, it also means that (on servers that balance speed per connection) your other users are suffering slower downloads due to it.

We will soon be utilizing some custom scripts on our servers to prevent such actions from occurring (if a connection to a file exists, they will refuse future connections until that connection completes, or sever the original).

- Robert

--

WebMaster:
BinFeeds
XXX Thumbnailed Image Newsgroups but

Is this like Usenet .PAR files? by ScottBob · 2001-12-13 05:54 · Score: 2

Lately in the binaries newsgroups, there has been increasing use of .PAR files. If a large number of .RAR or .ACE files are downloaded but some are incomplete or corrupt, just so long as you have an equal number of .PAR files as there are missing/corrupt files, the original data can be reconstructed, no matter which of the original files are missing. An explanation and a PAR file reconstruction program can be found at:
http://www.disc-chord.com/smartpar/

Swarmcast by mvw · 2001-12-13 05:55 · Score: 2

Mirror sites enable client requests to be serviced by any of a number of servers, reducing load at individual servers and dispersing network load.

But where to get the mirror sites from? For example if a new Star Wars trailer hits the net?

An interesting idea here is to distribute the server workload onto the clients downwards:
Swarmcast is such a protocol. It uses forward error correction as well, and I believe some of the guys whose names were mentioned in this discussion, are involved in this as well.

Any expert who can comment on this one?

Re:Archie .vs. Veronica by Medievalist · 2001-12-13 06:03 · Score: 2

/.
Well, RECENTLY, Veronica was mouse-driven.

Of course, if you weren't a young pup still wet behind the ears, you'd remember that Archie in _Archie_Comics_ had that checker-board thingy on the sides of his head, and Veronica didn't. And I've heard that Archie was latently homosexual, and Veronica wasn't (that's why he was always chasing the unattainable Veronica instead of scoring with hot-to-trot Betty).

I'm sure Bhil will correct me if I'm wrong.
--Charlie

Bittorrent does that! by justin_squinky · 2001-12-13 06:04 · Score: 2, Interesting

Bittorrent
does that! It's not really trying to be a napster or gnutella but a way to allow people to host a high volume website without having a lot of bandwidth. It works seamlessly with mozilla and IE. It's also quite fast unlike the anonymous p2p networks.

Standardization Efforts by Anonymous Coward · 2001-12-13 06:05 · Score: 1, Informative

For all that at not a mention of this link from the Digital Fountain site itself?

http://www.digitalfountain.com/technology/librar y/ standards.htm

Current topic sounds like... by Hyped01 · 2001-12-13 06:06 · Score: 1

Sorry if my last post didnt clearly indicate it was in response to the author's question about a previous company/software package that claimed to do something similar via multiple connections. I intended this as a 2 post reply so anyone who wanted to could respond to either part of my comments about either part of his post.

As for the current "new" method of decreasing download speed... it seems to me it's called file compression. Isnt that what gzip on a server does? Compress the file using a "mathematical equation", and then the client browser decompresses it on their end using "the computational power of standard PC processors" to do it? And as with a zipped or gzipped file, technically, none of the data is ever transmitted. Instead, a mathematical algorithm is used to compress and decompress saving transmit time.

Sounds to me like a fancy new name for a possibly new method of compressing data.

Of course, as the article on their website is lacking in technical information, who knows? But that's where my money would be. If not, I'd say it's a combination of that, plus commonly sent compressed data shortcuts... what I mean by that is, they could use something similar to a macro on a word processor. They could define small "macro" tags that would define certain byte sequences that they determined to frequently appear in a file, and send that "macro" along with the other compressed data. Of course, it's just another form of compression if they implement it that way as well...

- Robert

--

WebMaster:
BinFeeds
XXX Thumbnailed Image Newsgroups but

Why don't you disappear? by Anonymous Coward · 2001-12-13 06:12 · Score: 0

idiot

K5 + 6 months = slashdot story by Anonymous Coward · 2001-12-13 06:20 · Score: 0

This was up on kuro5hin months ago. The same type of discussion: excitement by the uninformed, background from the CS people. In the end I don't think it made the section headlines, let alone the whole site.

Maybe like this... by DrCode · 2001-12-13 06:23 · Score: 2

Client: Requesting 'kword'.
Server: Kword binary: 1.2Mb; Kword source .5Mb.
Sending source.
Client: Source received. Unpacking. Compiling.
Done.

Unfortunately... by Galvatron · 2001-12-13 06:42 · Score: 1

on average, the digit number would be the same length as the actual code. Hence, this would only save space on half the files. If you also sent the end digit, then it would almost never save space.

--
"The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD

People Magazine? Oh no, EEtimes, huh? by sammyo · 2001-12-13 06:45 · Score: 1

What's amazing is an article this bad from a respected technical magazine. EE's can't handle the concept of FEC? Well maybe, anyway, most of the readers here have probably spent double the time on this than writer.

If there's redundancy, there has to be overhead by Animats · 2001-12-13 06:54 · Score: 3, Insightful

The article claims that there's no overhead added, in terms of extra data, to transmit data in this way. That has to be wrong. If you add redundancy, you have to add bits. Maybe not very many, but some. Basic thereom from Shannon.

I'd guess they have to send about 2x as much data as the original for this to work. Maybe they're applying a compressor (like gzip) up front, and taking credit for that.

As others have pointed out, there are lots of other schemes in this class. Bear in mind that this is a protocol for broadcast and multicast, not general file transfer.

In a word, it's like mp3 by Anonymous Coward · 2001-12-13 07:02 · Score: 1, Insightful

Basically it is non-destructive version on mp3ing music.

You convert a digital document into a mathematical function which when applied retrieves the elements one by one composing the document.

Mp3 do the same but in addition, they remove what you don't hear to make the mathematical function easier to produce (actually it is discreet values for a Fourier equation that is calculated, but the idea is the same).

Now, as to how they can charge so much, I have no clue. I am sure a skilled university student could do the same... unless there is a new 'patent protected' sticker on the package.

Artaxerxes

Re:In a word, it's like mp3 by DaCool42 · 2001-12-13 09:11 · Score: 0

no, it's not just data compression. if you cut out a bit of data from the middle of an mp3, the song still works, minus that part. what they are talking about is a protocol where if any part of it is missing, the whole thing doesn't work. therefore, all you need to do is recieve data till it is complete. no integrity checking required.

--

----
All of whose base are belong to the what-now?

Tear Drop File Pool, no PUBLIUS! by transami · 2001-12-13 07:03 · Score: 1

I played around with something called File Pool/Tear Drop or something like that. But the real daddy of such systems in Publius. Publius diributes files across sets of servers with redundency and encryption. It is possible for the owner of a document to restrict access and prevent deletion of the file, even bu him/her self. It is a very cool system. It would be nice to see it become more standardized and utilized.

--
:T:R:A:N:S:

Integer math -- even simpler than that! by Mr+Z · 2001-12-13 07:14 · Score: 1

Read the article. They're just using XOR. It's like using RAID 4 checksum blocks, except they're doing it on a file transfer instead of on disk blocks.

(I'm oversimplifying a bit, but really their approach doesn't sound all that special.)

--Joe

--
Program Intellivision!

Re:Integer math -- even simpler than that! by Deflatamouse! · 2001-12-13 09:36 · Score: 1

I think their approach is very similar to how CD-ROM works when it cannot read a certain portion of a cd, ie due to scratches, etc.

A standard CD-ROM can hold much more than 650mb of raw data, but a portion of it is used to store some sort of checksum (you can look up the details on how exactly it works yourself), leaving 650mb for the user. Using this information, your computer (cdrom driver?) can reconstruct a certain amount of missing information that it could not read off your CD.

Now, I don't think the algorithm used on the CDs for data recovery is very aggressive. There are a lot more aggressive CRC algorithms that can let you recover more bits of data (or prevent certain types of corruption), with less overhead bits. This is what i think they are doing. Correct me if I'm wrong.
Re:Integer math -- even simpler than that! by RackinFrackin · 2001-12-13 11:59 · Score: 2, Informative

Yes. CDs encode data using two Reed-Solomon Codes. Each sample (44,100 per second) records two amplitude measurements (one for each channel), as a 16-bit words. Thus there are 4 bytes/sample. Measurements from 6 consecutive samples are grouped together, and encoded using the Reed Solomon code RS(2^8,5). This adds 4 bytes of redundancy to the 24 bytes of data. After that, the 28 resultant bytes of data are interleaved, then encoded in a a shortened Reed-Solomon code that adds another 4 bytes of redundancy. Then one more byte is added that is used for some type of control purpose (i don't know exactly). After that, a translation is used to make the physical reading/writing more feasable, three bits are added for some reason after the translation, then a 27 bit word is added for syncing purposes.

Overall 6 "ticks" of stereo music contain 196 bits of data, which is expanded to 588 bits written to the disc. The end result is that any error burst of 8871 or fewer bits of data on the disc can be corrected.

Junk or Genius? by Spazmania · 2001-12-13 07:22 · Score: 2, Interesting

This could be true genius or mediocre junk depending on the details. That article wasn't clear: Does it have to be N specific packets (in any order) or can it be any N of the transmitted packets?

N specific packets is, well, nothing special. The idea of using negative acknowledgements instead of positive acknowledgements and retransmitting only the missing pieces is nothing new. It was used extensively in pre-Internet days on dedicated connections. Ever heard of Z-Modem? Its been avoided on the Internet because it makes the congestion control problem very difficult. Translating it into symbols is really just saying that they also compress the file first. Whoop de doo.

On the other hand, if it can be ANY set of N of the transmitted packets, well, that's downright incredible. The applications for such a technique are boundless... Everything from finally making multicast viable and desirable to latency and congestion indifferent file transfers to ultra-reliable offline storage.

How would you like a web server that can serve a T3's worth of clients off of a T1 and do it in such a way that the smart web browser can pre-cache data from the server in anticipation of the reader's next click, realtime adaptive based on the documents currently in multicast transmission to somebody else? Oh yeah, the web server can be on the Moon and respond to you as fast as if it was across the street. Genius.

So which is it? Any set of N or a specific set?

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.

An obligatory Freenet post by LM741N · 2001-12-13 07:26 · Score: 1

There was a time that I might be able to compare Freenet to such a scheme. However, Freenet has become so complicated, even some of the developers don't seem to fully understand how it works :)

But I can attest to the fact that it does work. But its no blazing fast network. Rob.

Re:Archie .vs. Veronica by Anonymous Coward · 2001-12-13 07:30 · Score: 0

you've been watching too many Kevin Smith movies.

get a life. let me know how it goes too...I may decide on getting one someday....

Might I suggest rsync... by cduffy · 2001-12-13 07:34 · Score: 2

...as a suitable high-level system for those concerned about data corruption in file transfer? (The P2P system Electronic Donkey 2000 also incorporates a great deal of high-level error checking -- discarding and retransmitting file blocks whose checksums fail and even those which were corrupted on disk since the beginning of the transfer in the event of a restarted transfer).

A new invention by Peaker · 2001-12-13 07:36 · Score: 1, Redundant

Called compression?

Check out TSBN by Anonymous Coward · 2001-12-13 07:39 · Score: 1, Interesting

A company located at www.tsbn.com has been doing this since at least 1996. Their IP is filed back past then anyway. They claim to do it all in software without devices. Guys from some of the companies mentioned above have stock in TSBN.. interesting.

RFCs by Cato+the+Elder · 2001-12-13 08:24 · Score: 2

There's a difference between "stick with what we know works" and "Stick with the RFCs." RFCs are dynamic. The TCP and UDP ones have both gone through several revisions. The difference is if you stick with the RFCs you have a bunch of engineers from a bunch of different companies, schools, etc. look at your protocol and make sure it can be implemented from the spec. They also try to balance out its effect on internet traffic as a whole.

UDP is a perfectly legitimate method of moving data, especially if you can tolerate loss. It has been the general position of the IETF for years that you shouldn't reimplement retransmission--that's TCPs job. However, from descriptions it sounds more like this is a smart form of connection blasting, which isn't reinventing the wheel but can be hard on networks.

Speaking of that, it isn't the network that breaks because of UDP--it's your program. If you start sending large amounts of UDP data and chewing up the bandwith on my router, according to the spec I can start not even attempting to process the packets. This is the problem with blasting protocls--by taking up more bandwith than TCP for the same data transfer rate they invite administrative blockage.

Bandwiz software of Digital Fountain appliance ?! by Anonymous Coward · 2001-12-13 08:35 · Score: 1, Informative

Checkout Bandwiz(www.bandwiz.com) software based products, it seems they have far reaching capabilities than Digital Fountain.

Swarm Cast by Anonymous Coward · 2001-12-13 08:53 · Score: 1, Informative

I believe open cola's swarm cast was the info you were looking for. I remember it from a story on /. a while back, anyway, heres the link ;
http://www.opencola.com/products/4_swarmcast/
Enjoy,l8r.

Everyone seems to be missing the point... by DaCool42 · 2001-12-13 09:18 · Score: 0

They are NOT talking about downloading from multiple servers simultaneously (although this protocol would allow that to be easily done). What they are talking about is a method of transmitting data that doesn't require it's integrity to be verified. The recieving end doesn't need to keep saying "got the packet...got the packet", but rather sit there recieving until it has everything. Basically it's UDP with built in integrity check that requires no additional communication.

--

----
All of whose base are belong to the what-now?

This concept appears the same as PAR files. by Anonymous Coward · 2001-12-13 09:28 · Score: 0

This concept seems to be the same as the PAR file concept used often in USENET binaries postings. PAR files are files used to reconstruct split files. If you are missing x files from a set you can reconstruct the files if you have any x PAR files. This is usefull as it allows 1 file to replace any missing file in a set. This article seems to be appling a similar method to file transfers. More information and links are avilible from the popular windows tool for PAR files Smartpar. http://www.disc-chord.com/smartpar/index.html

Re:I've wored in at least 2 positions, by Anonymous Coward · 2001-12-13 09:35 · Score: 0

WORED is a very interesting misspelling, from the context I expect you meant to type 'worked' but any of the following might also be appropriate:
Worked
Woried
Whored
Wired
Warred

Deja vu by Anonymous Coward · 2001-12-13 09:44 · Score: 0

Why does this remind me of that company who claimed they could compress a 1MB image down to 1 byte? Seems too good to be true.

REALLY overestimating themselves by 8onal · 2001-12-13 09:49 · Score: 1

In this case, the Transporter Fountain creates not equations but hundreds of millions of "symbols" which can be used to reconstruct the data.

"Symbols..." like zeroes and ones??

Re:REALLY overestimating themselves by Hast · 2001-12-13 10:23 · Score: 1

No, not necessarily. When speaking about compression, encryption and coding the term "symbol" generally represent a block of bits. Like "63" in ASCII represent "a". (I can't be bothered to look the number up so don't kill me if I got it wrong, please. ;-)

The size of the block is generally different for different symbols. This is the basic idea behind compression, let common sequences be represented by a smaller bitsequence, a kind of short hand if you will.

What they do here is take an original file, generate a new sequence of symbols which represent the same file, but with redundancy. (The opposite of compression, you add information. (Not willy nilly naturally)) The receiver will then not need ALL of the generated "sequence of symbols" only enough to determine the original file uniquely.

If you're interested in it look upp "Huffman coding" for an example of elegant coding. That might clear some details up. (That is for compression though, not for adding redundancy.)

FEC very common in wireless by Anonymous Coward · 2001-12-13 09:57 · Score: 1, Insightful

This has nothing to do with compression or cryptography. Basically these guys are using Forward Error Correction (FEC) which amounts to adding enough redundancy to the bitstream that the message can be recovered at the other end even if a certain number of bits are lost along the way. They are assuming that the extra bits required for the redundancy is less than the extra "back-and-forth" bits required for FTP over TCP. If such is the case, then you get better throughput.

FEC is widely used in transmission media like cell phones and cable modems. These channels regularly introduce all kinds of distortion and interference. It's the use of FEC that allows you to still hear conversations despite the interference.

There are basically two kinds of FEC -- block codes and convolutional codes. Block codes are just that. They take in a block of bits (say N) and spit out (N+K) bits. The K is your added redundancy more or less. Convolutional codes operate like finite state machines. They continually take in a stream of bits and spit out more bits at a higher rate. Tornado codes, which these guys use, are (I believe) a type of block code. Most digital cell phones use a convolutional code. In fact, the founder of Qualcomm, Andrew Viterbi, is the "inventor" of the mechanism by which convolutional codes are decoded.

Hope this helps.

This was done over a decade ago. by The+Panther! · 2001-12-13 10:00 · Score: 2

What's the big deal? Direct connections did this on BBS's a long time ago. Simply, streaming all the data without flow control (Xmodem and early YModem required ACK/NAK), and let the client request the packets that it is missing leisurely. ZModem was the first to allow out of order delivery, from what I recall, and it revolutionized transfers with maximized bandwidth consumption.

The only thing this particular implementation allows is connecting at an arbitrary time and listening to a continuous loop of packets until you get them all. ZModem could easily be modified to do exactly this, except with checksum data per packet rather than XORing chunks of packets or using symbolic representations. I'm not read up on FEC, but if you're transmitting already compressed data (near the bit distribution of entropy), no alternate representation is advantageous than sending the raw data.

--
Any connection between your reality and mine is purely coincidental.

FSP Enhanced?! by Anonymous Coward · 2001-12-13 10:14 · Score: 0

This sounds like enhancing the good old FSP protocol with some forward error correction. No big deal, right? :)

This works (it isn't magic) by kanenas · 2001-12-13 10:40 · Score: 1

More info at:
http://planetanalog.com/news/OEG20010413S0034

What they have done is that they have created a scheme where a server sends the SAME stream (broken into many smaller ones) to multiple clients via multicast.

Even if you lose a packet you don't ask for retransmission but you continue listening because of Forward Error Correction in the stream (lot's of math in here). There is no control information going back and forth. Their client just listens and never sends anything except when he is done.

This essentially permits to have a single server for 1000s of clients (they talk at the article above about 10,000 VHS-quality streams at 300 kbits/s).

This is for real. I've read a lot of their papers at researchindex. They mean business and they have a viable product. To be able to send video(large files) to 1000s of clients at the same time is above the abilities of akamai too.

kanenas

very common (albeit lossy) for multicast media by Anonymous Coward · 2001-12-13 11:01 · Score: 0

and lossless file distribution.

http://research.microsoft.com/barc/mbone/fcast.h tm

UDP unreliable? not really... by MavEtJu · 2001-12-13 11:18 · Score: 1

UDP drops packets., UDP is unreliable and other stuff...

It's the network- and the IP layer which drops the packets, they don't mind it it's IP/IPX/Decnet or TCP/ICMP/UDP.

The difference between TCP and UDP is that TCP is session oriented (i.e. the application layer gets the data in a stream instead of per packet and doesn't have to worry about making sure they're in the right order, missing packets et al) and that UDP is not session oriented (i.e. the application layer gets the data in order of arrival, it has to worry about the missing packets, the right sequence).

For the rest, they are treated the same on the network and IP layer, it's only who has the responsibility regarding error-correction and sequencing.

--
bash$ :(){ :|:&};:

Re:UDP unreliable? not really... by Tazzy531 · 2001-12-13 11:29 · Score: 1

But I think the point of it is that since it is coming from a variety of sources, it doesn't matter if one packet gets dropped when it could be attained from another. This was the way that iMesh worked. I'm not sure if they are still around.

Secondly, by using UDP and a variety of sources, you speed up the transfer by removing the ACK (acknowledgment) used in TCP. Granted UDP requires a more intense system processing afterwards then TCP but with our systems today, it can surely handle it.

--

_______________________________
"I'm not Conceited...I'm just a realist..."

Re:I've wored in at least 2 positions, by Thomas+Charron · 2001-12-13 11:55 · Score: 1

DoH! Shows me for not proofreading the darned post. Worked.

--
-- I'm the root of all that's evil, but you can call me cookie..

Freenet by mOdQuArK! · 2001-12-13 12:44 · Score: 2

Actually, if you think about it, this kind of approach would work pretty well for an encrypted/distributed sharing network like Freenet, where individual nodes of the may or may not be available. You can take all of the pieces of the given file & spew them to lots of different parts of the network, then a client can just go around asking different nodes in the network for any pieces of that file until the client has collected enough of the pieces to form the whole.

Dont forget Swarmcast by Anonymous Coward · 2001-12-13 13:52 · Score: 0

Being a sourceforge project and being mentioned on Slashdot before this was probably what Michael was thinking about.

Stream Theory by Anonymous Coward · 2001-12-13 14:21 · Score: 0

User would go to download a game demo or something, receive pieces from several different places, and knit them together?

It sounds kind of like Stream Theory that you're referring to. Getright doesn't use UDP afaik and has nothing in particular to do with game demos.

AAAAAGH!!!! by peccary · 2001-12-13 14:43 · Score: 2

why do people have to take a technology which is perfectly reasonable and useful within a certain problem domain, and turn it into snake oil?

I read that whole article, just praying I would see the name "Shannon" or the word "congestion" in there, or even TNSTAFL, but nooooo.

What these guys are doing is absolutely wonderful when

- the bandwidth-delay product is greater than the size of the data to be transferred

- upstream bandwidth is far more expensive than downstream bandwidth

- the receiver is stealthed

- the communications link is half-duplex.

- the receiver has cheap, fast*, temporary storage available. (faster than the network, anyway).

- occult knowledge is being transmitted, and the parties wish to have some deniability

What they are doing is absolutely horrible when the network links are shared with other people -- unless they have some other means of congestion control.

That's not unreasonable. Couldn't somebody just say as much?

how extremely stupid by nobody/incognito · 2001-12-13 15:00 · Score: 1

somebody forgot to tell these yokels that they fixed this problem in tcp years ago.

i wish them luck in their ipo

nobody

--
parturiunt montes, nascetur ridiculus mus

Very cool for RAID by Bitmanhome · 2001-12-13 15:05 · Score: 1

I had this idea not long ago -- FEC should be used for RAID! It would be cool to have an array of 50 drives be able to tolerate the loss of any 5 with no loss of data.

-B

--
Not that this wasn't entirely predictable.

This press release loses. by Harik · 2001-12-13 17:45 · Score: 1

... for a number of reasons.

Loss #1: The comparason to "FTP".
Had they said 'HTTP' people would realize, "Oh yea, I download stuff off the web all the time, works fine here. Why should I spend 150 grand?" It's a pure PR trick. They're talking about ACK delays with high-latency TCP links. The problem is better solved with a modern TCP stack at both ends (I.E. put an active TCP proxy at both ends, or even at the sending end, and your traffic flows much better)

Loss #2: "we don't require data to be acknowledged"
... translates to "If your machine crashes, we'll just keep sending UDP packets forever. So it _DOES_ need to be acknowledged, they just don't admit it because it sounds cooler.

Loss #3:"Transforms the data into a recepie..."
... much the same way gzip does. Or perhaps they're talking more like raid5 equasions on your data. Either way, they're spending a LOT of CPU power on a non-solution.

Loss #4: Lack of feedback on net conditions along your path leads to overall slower delivery.
Yes, amazing as it may seem sending data faster means it gets there slower. The net is a series of leaky pipes... build up too much pressure and packets escape (get dropped) Simply blasting UDP packets at the destination tends to waste a LOT of bandwidth, and more then likely will take longer then having them simply hit a URL and download it.

Loss #5: michael, for posting this without having any clue.
News for Nerds, stuff that matters. One tends to wish the people POSTING IT actually had any idea what these strange words (like FTP, UDP, TCP) mean.

--Dan

Re:This press release loses. by Hyped01 · 2001-12-14 10:41 · Score: 1

Harik said: >Loss #5: michael, for posting this without having any clue. >News for Nerds, stuff that matters. One tends to wish >the people POSTING IT actually had any idea what >these strange words (like FTP, UDP, TCP) mean. A well made point about the "technical" article on their site as well. They oddly cite no realy technical information, or anything that would in any way point to any true advantages, but use such acronyms and other TCP/IP terms that the general user wont understand "willy-nilly" to try to impress people (and maybe make them forget the true lack of techincal merit their information has). - Robert

--
WebMaster:
BinFeeds
XXX Thumbnailed Image Newsgroups but

Here's a better way for fast file transfers by Deflatamouse! · 2001-12-13 18:20 · Score: 1

Very similar to hoffman encoding...

Compile a table of say... the top 2^16 1k chunks(this can be arbitrary) of data that's flowing through the net by observing the traffic flowing though the backbone. Now distribute this table (65536k in size) to everybody.. heck make it come default with Windows and Linux or something. When a person is compressing a file, the compression program will refer to this table and whenever the program comes across a 1k chunk of data in the file that is a 'hit', replace it with a 16 bit value (which is preassigned to that chunk of data) + an escape sequence. For all other data that's not found in the table, perform regular compression.

Now since this table of 65536 (this size could be arbitrary powers of 2, of course, depending on how practical the size of the final table becomes) entries are very common in net traffic, it is very likely that your file contains many 'hits', which will considerably reduce your file size (i.e. 1024 bytes reduced to 16 bits + a few more bits).

You can even have many different tables of this, one for mp3 compression, one for text files, and so on, and just have the compression program analyze your file before it selects a table, and then append a table number in front of the resulting compressed file for decoding.

Redundancy by Anonymous Coward · 2001-12-13 18:22 · Score: 0

So it's actually sending significantly MORE data to acheive some tolerance of dropped packets. And how is this a win over just sending the same packets multiple times?

Re:Redundancy by BlueUnderwear · 2001-12-13 22:53 · Score: 2

And how is this a win over just sending the same packets multiple times?
Sending the same packet multiple time leads to a higher volume (you at least double the volume), for actually less fault tolerance.
Suppose you have a file made up of a hundredblocks, and that we have a 5% probability of packet loss.
In case you transmit each packet twice, you have for each packet a 0.05 * 0.05 probability = 0.25% that you loose both. This may seem small, but if you calculate it, you'll see that the probability of that not happening for any of the 100 packets is quite small: (1-0.0025)^100 = 77.8%. So, you have roughly one chance in 4 of not transmitting your file in its entirety. And to achieve this modest result, you needed to transmit 200 packets, rather than 100.
Now compare this to FEC: you transmit a number of additional packets which can be used to compensate for any loss, not just for its companion packet. Just adding 20 more packets brings the total probability of a successful complete transmission to 99.9999% (that's the probability that less than 20 packets are lost out of 120, given an individual loss probability of 5%). So you transmit less packets (120 rather than 200), and you get a higher assurance of overall transmission success.

--
Say no to software patents.

Congestion control? by Cato · 2001-12-13 21:09 · Score: 2

This sounds like a problem for congestion control - UDP apps frequently don't have TCP-friendly congestion control, so they can get an unfair share of the network's bandwidth, affecting other applications that use TCP or TCP-friendly UDP protocols. FLID sounds like their form of congestion control, but if they are doing better than TCP it's quite likely it is more aggressive in its network impact. However, handling long burst errors sounds like a useful thing - of course, this can mask interesting errors in network configuration, e.g. lack of FEC on a wireless/satellite link, or lack of Frame Relay traffic shaping on a router with a FR link (this happened to a customer yesterday...).

Re: UDP+Math........ by Anonymous Coward · 2001-12-13 22:03 · Score: 0

Cutting through the vague description of this 'proprietary algorithm', it could be nothing more than an an application-level use of a block-oriented forward-error-correction code (the "+Math" part).........send a stream of bytes (the original data for a systematic code, some polynomial transform of it for a non-systematic code; this is the "UDP" part) plus the parity bytes ad nauseum until the receiver gets enough parity bytes to get a valid decode of the data, at which point the receiver sends back one ACK.........it ain't rocket science, but it could be a neat trick. You've got to have some sequence information (ya gotta know where each byte goes in the polynomial) but you can get them out of sequnce, or lose some entirely. Holes in the sequence, whether data or parity bytes, get patched by the block-code when you've received enough parity bytes (coding theory 101). The internet MDPv2 protocol uses block codes to reduce the number of ACKs (and redundant repetitions of the same errored block) in a similar manner (though with it's support for reliable multicast, it's something of the converse of the examples others have given in this thread)

Re: UDP + Math ..... by Anonymous Coward · 2001-12-13 22:17 · Score: 0

Regarding Open solutions.......the algorithms, information-correcting code, and source code for the Multicast Dissemination Protocol (MDPv2) are all available on the net., with versions for WINTEL, LINUX, and other UNIX flavors..........if it's not what's done here, you could adapt the packet structures and source code for large point-to-point transfers with reduced ACK requirements; a good project to put on Source Forge.
I agree with the earlier comment that the only thing possibly new about the parent story here is the marketing hype. And for all that I like MDPv2 and its efficiency on the INTERNET..it's inspired by on an old IEEE Communications Theory Section paper from the '70s about reliable multicast in SATCOM systems....... So maybe the lesson is that some good ideas keep coming back in different guises and uses.........

moderators - what's up? by Raphael · 2001-12-14 01:20 · Score: 1

Now this is interesting... My previous comment (parent of this one) was slightly wrong because the solution proposed by this company works differently. They use some FEC method that is better than Reed-Solomon codes, as I discovered later by reading their detailled technical documentation (available in their technology library.)

But I cannot understand why my comment which proposed a different method was rated three times as "Overrated (-1)", while its parent was rated twice as "Interesting (+1)". I do not understand why three different moderators would give exactly the same moderation, especially when the comment was already scored at 0 and the article contains many other comments could have been moderated up or down.

--
-Raphaël

Re:Archie .vs. Veronica by Brennus57 · 2001-12-14 03:54 · Score: 1

Well I don't know who might have said that Archie was latently homosexual. If anyone did I suspect it was probably the Rev. Jerry Falwell or some other member of America's Taliban. They seem to see homosexuals peeking out of every closet.

As far as Archie's relationships are concerned I think you can pretty much sum it up with this quote "Veronica sends me and Betty defends me". He's clearly been completely infatuated with Veronica since the 40's. I suspect that her father's complete disapproval of Archie has added the lure of forbidden fruit to her obvious charms. Meanwhile Betty is always ready to rush to his rescue when he's in a jam and comfort him when he's down. It's a lot like the three-way relationship between Johnny Rico, Carmen and Dizzy in Paul Verhoeven's adaptation of "Starship Troopers" but I digress.

Bhil

Wavelets? by Anonymous Coward · 2001-12-14 04:53 · Score: 0

This reminds me of a 'transform' like mathematical theory called Wavelets that I read a paper on back in the late 80s...

Zmodem works wondefully over SSH by Anonymous Coward · 2001-12-15 09:24 · Score: 0

...with the benefit of being secure.

Just sz or rz like in the good ol' dayz :)

Re:I've wored in at least 2 positions, by Anonymous Coward · 2001-12-15 16:46 · Score: 0

woried

That's worried, you spelling-flame fuckhead.

Slashdot Mirror

UDP + Math = Fast File Transfers

449 comments