UDP + Math = Fast File Transfers
Wolfger writes: "A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..." Well, the information is transmitted, just not in its original form. I think there are a couple of other places working on similar ideas - wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.
In essence, is not this the same as file compression? The amount of information is the same
(for those, who remember, what Bit is). It is just, that the usual one character per byte is awfully wastefull. Which is why the various compressors are so effective.
Add a modern data transfer protocol and you may :-)
get some start up money
In Soviet Washington the swamp drains you.
I mean it's not for image compression specifically, but it definitely reminds me of IFS image compression in some ways. I'll bet that compression is very time consuming, but that's fine if you're warehousing data. I wonder if the clients are pre-loaded with a body of parameterized functions, so that the server just sends information describing what functions to run and what the parameters are. I guess if it's all based on polynomials all it needs to send are vectors of constants.
Neat idea. Patents: here and here.
The program was called xolox,
;)
I know the developper personally and he is very disappointed about the corporate feedback he got.
People loved it, corporations didnt, so he shut down his site and with it Xolox ( unless you have a hacked version of course
Cheers.
My other sig is Funny.
This could be done easily without the proprietary algorithms. Just send update packets with a header in each on stating that it is packet number N and there are X total packets. Then, request missing packets when you get towards the end, and put them all together in order when you get them all.
Somewhat unrelated --- Does anyone else miss Z-Modem. We need a zmodem like program for that works over telnet so we don't have to open a separate FTP session. In the BBS days, you just typed rz filename and it came to you.
Quite correct. This protocol does not sound at all TCP friendly. It needs some way of dynamically responding to network conditions to be that way. Even something so simple as doing an initial bandwidth test, then rate limiting the UDP packets to 90% of that would be a big help, though for a large file that would still create a lot of congestion problems.
Does anybody know if IPV6 has any kind of congestion notification ICMP messages so stacks can know to throttle applications when the applications are blasting out data that's congesting the network?
Need a Python, C++, Unix, Linux develop
The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..."
This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed. The only time you get a 1 for 1 ack ratio is if you miss a packet and the window slams shut.
Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's. Frame Relay switches may drop packets if you exceed your CIR and begin bursting, so that whole transfer will never succeed. Therefore you actually waste bandwidth cause the whole transfer is doomed to fail, and the sender will never know it.
Also some routers have WRED configured in their queues, purposely dropping TCP packets to increase bandwidth on a global scale. This would damage the file transfer process as well if it was UDP based, as this system is.
Stick with the RFC's and the tried and true TCP transport system. This company will fail.
--- RFC 1149 Compliant.
The 'equations' are broken up into pieces such that if you recieve any N pieces, you can reconstruct the entire file. It's like how some key sharing schemes work. Like Publius for example. Any N 'shares' of the key can be used to reconstruct the entire key. In this case the 'key' is the whole document, and I'm betting they use a different sharing scheme than ones already used for cryptography.
Need a Python, C++, Unix, Linux develop
I think you missed the point. Sure, you need only to sip from the fire hose to finally quench your thirst, but that wastes a lot of water.
Where TCP acks and controls flow on every single packet, which also must be received by the sender in order, a protocol like this can occasionally send back a report on what kind and how many packets it is receiving (fragmented, packet loss due to bandwith, packet loss due to congestion).
Sounds like Code Excited Linear Predition. Linear prediction uses an equation which can be used to replicate a data stream. Code excited, in this case, means you have a variety of equations to choose from, and you can plug in different coefficients for the various equations. Consequently, all you need to transmit are:
- which equation to use (if there are 256 equations to choose from, that's an 8-bit value)
- what coefficients to use with those equations
- how long to follow the resulting data sequence, before you need to change the values
CELP was applied particularly to speech compression. The DOD was using it with 4800 bps (yes, that's 4.8 kb/s, about 1/2 - 2/3 the bandwidth used by PCS cellphones) modems and encryption systems to transmit secure voice.It sounds to me like they've got a more general selection of equations, and the resulting datastreams are interleaved. If you don't get the information for all the equations, you can't accurately reconstruct the data. Additionally, since you don't have to respond to every packet, you reduce the turnaround.
Unless I'm mistaken, Fiber Channel already does the latter part.
In short: a more general application of existing compression schemes and protocols.
95% of what you read about these days is just mucking around with new combinations and applications for existing inventions.
After looking through some of the material on the company's web site, this product appears to be based on LT (Luby Transform) coding. Each encoded packet is the result of XORing a random selected set of segments from the original file. When sufficient packets have been received, they can be used to reconstruct the original file. Insert magic algorithm. The nice thing about this is that the transmitter can continually stream packets, and a receiver can start collecting packets at any point in time. When the receiver has collected sufficient packets, it can reconstruct the original file. Packet ordering is totally irrelevant. You just need enough packets to generate a unique solution. The math for the code has not been published yet, but this is supposed to be a successor to tornado codes, which have been described in the literature.
Mea navis aericumbens anguillis abundat
Multicast is congestion friendly in that it can inherently seriously reduce bandwidth consumption, but it's obviously not congestion adaptive. I think the easiest (and probably best) way to introduce congestion control in a multicast is to have multiple streams at different rates, e.g., you have five simultaneous multicast webcasts of a Victoria Secret show, and folks can join the mulitcast group with the video quality that best suits the bandwidth and congestion on their pipes. This idea works very well for the Digital Fountain-style software distribution: rather than having a server serving the same stream at 10 different rates, you can have 10 servers streaming their own unsynchronized Tornado encodings, and clients can subscribe to however many their pipes can support. With 10 streams of data coming at you, you can reconstruct the original data ~10 times as fast.
This is really funny. I used to work for the NYSE (Stock Exchange) and it wasn't uncommon to transmit 10GB of data over night via FTP to a single brokerage. The data that they were transmitting was at least as valuable as chip designs. It would allow you to alter trade position after trades had been transacted but before they cleared. (e.g. cancel a bad transaction or increase the amount of a good transaction)
The Economics of Website Security
It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss. How about simply compressing the data and using bigger packets to transmit it? If you can use big enough window while sending data you can push all the data to the network in the beginning. Conversation comes to A) Here it comes A) Done [64 packets, 125MB] B) Okay, listening B) Resend 2,7 and 25 A) Done [3 packets, 6MB] B) OK. Note that A starts sending before getting reply from B. In fact, with fast long-distance connection it could be that A gets to the end before B getting "Here it comes".
I think if we want to speed up file transfer we need an API to tell OS that we're going to send lots of data so make it big packets or the opposite. Currently we just open socket connection to destination and start write()ing. OS has no way to guess whether or not we're going to write 100 or 10e8 bytes. We need a way to tell OS that the data we're sending isn't worth a dime before it's all done so make it big packets to minimize bandwidth wasted to TCP control traffic.
You can opt to waste bandwidth to reduce perceived latency and that's what I think is done here. A sends file twice and in a case some packets were lost the sent copy would be used to fill in missing parts. A has sent missing packet before B had known it's missing it. Yeah, A wasted half the bandwidth for the redundant data that got correctly to the destination at the first time but we aren't interested in that. The key here is to use UDP so that lost packets are really lost instead of automatically resend. This kind of setup increases overall throughput only if latency is the only problem in your network. Perhaps this is needed in the future because increasing bandwidth is easy - not necessarily cheap - but the light crawling inside fiber is so damn slow!
_________________________
Spelling and grammar mistakes left as an exercise for the reader.
Bittorrent
does that! It's not really trying to be a napster or gnutella but a way to allow people to host a high volume website without having a lot of bandwidth. It works seamlessly with mozilla and IE. It's also quite fast unlike the anonymous p2p networks.
This could be true genius or mediocre junk depending on the details. That article wasn't clear: Does it have to be N specific packets (in any order) or can it be any N of the transmitted packets?
N specific packets is, well, nothing special. The idea of using negative acknowledgements instead of positive acknowledgements and retransmitting only the missing pieces is nothing new. It was used extensively in pre-Internet days on dedicated connections. Ever heard of Z-Modem? Its been avoided on the Internet because it makes the congestion control problem very difficult. Translating it into symbols is really just saying that they also compress the file first. Whoop de doo.
On the other hand, if it can be ANY set of N of the transmitted packets, well, that's downright incredible. The applications for such a technique are boundless... Everything from finally making multicast viable and desirable to latency and congestion indifferent file transfers to ultra-reliable offline storage.
How would you like a web server that can serve a T3's worth of clients off of a T1 and do it in such a way that the smart web browser can pre-cache data from the server in anticipation of the reader's next click, realtime adaptive based on the documents currently in multicast transmission to somebody else? Oh yeah, the web server can be on the Moon and respond to you as fast as if it was across the street. Genius.
So which is it? Any set of N or a specific set?
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
A company located at www.tsbn.com has been doing this since at least 1996. Their IP is filed back past then anyway. They claim to do it all in software without devices. Guys from some of the companies mentioned above have stock in TSBN.. interesting.
Data transfered to variouse financial routines are routinely transmitted using insecure protocols in an unencrypted format. I've wored in at least 2 positions, one in a creditcard clearinghouse, another at an ECN. This data goes over unsecured channels all the time.
-- I'm the root of all that's evil, but you can call me cookie..