Software Distribution via Multicast?
RockyMountain asks: "When it took me over 24 hours to download the latest Mandrake ISOs, I got to wondering...why do we still put up with servers overloaded with zillions of simultaneous TCP connections, all sending copies of the same thing?Hasn't multicasting evolved to the point where there's a better way? A quick look at Freshmeat turned up no obvious candidates. Are there any protocols or programs for distributing software via multicast? Are there any evolving standards? Or are there fundamental problems with this approach that I am overlooking?" An interesting question. With my limited understanding of Multicast, I would think that, at the very least, if you are a software distribution site you can have software distribution "channels", where each channel serves one piece of software. Milticast clients wanting a specific piece of software would connect to the right channel and wait until the next time it starts serving the software from the beginning (or, in the case of an interrupted connection, when the channel gets to the appropriate resume point). Might such a system be ideal for multicast? Can any of you come up with others?
The biggest problem I see with a multicast method for ISO images isn't the varying start times, but the varying connection speeds of users.
Multicast video works because there is only X number of bits needed at any one time. Aka, a 24k video stream only needs 24k of pipe to work, having a T1 down won't help you get the video faster. But having a T1 will help you download an ISO image faster so you can start the install process.
Also streaming video and such does not require a perfect stream... if a piece is missing it just ignores it and goes on its way. But an ISO image needs to be perfect. If not you just made a nice coaster for your coffee cup.
The only way I see it working is if everyone agreed to download at the speed of the slowest link. And I'm not going to agree to let my DSL line go to waste so I can download at the 33.6k of the dialup user who wants to wait 4 days for a download. Also having to be perfect would require the server to resend anytime a client reported a lost or corrupted packet. One needs only to be familiar with Norton Ghost and a lab with one bad NIC or HDD to see the crawl this will result in till the bad box times out.
So while nice in theory I doubt it would have much benefit outside of a controlled lab environment where everyone is on the same high-speed connection and there is very little loss of packets.
iRepairIT - iPhone, Mac, & PC Repair
I'm no expert on network protocols. I'm not even a software guy. So some of what follows may seem very naive. Bear with me and see if this makes sense. Here's how I see it working.
Data Rate. The server would send several streams at once on several channels, each one paced for a different data rate. For example, the T1 user would pick a different channel than the 28K modem user. Each channel endlessly repeats the same data set, over & over.
Keeping Track. Each datagram sent would contain an offset value that shows where it fits into the big picture. Thus, the client knows which parts of the whole have been received, and which ones have not. As we shall see, this helps deal with start time synchronisation and dropped packet issues.
Start Time. You don't even try to synchronise start times. If a client connects in the middle, so what. It just stores the second half of the data set, then stays on the line for the next repetition of the first half. The client knows when it has received the whole data set, because each datagram is tagged with an offset that shows where it fits into the big picture.
Missed Packets. This is the hard part. If a client misses a packet because it is dropped en- route, or for whatever reason, there are a few ways to deal with it.
- The client could just wait for the next iteration of the data set, and
listen for the datagrams that fill in the blanks.
- The protocol could use a UDP backchannel which allows clients to request
retransmissions of datagrams by offset. The server could keep track of which
datagrams have been requested, and periodically retransmit those datagrams
out of sequence. If there are too many, and forward progress is threatened,
the server could keep a histogram of which packet have been requested most
often, and resend the most-requested ones only -- let the others wait for the
next iteration.
- My favorite approach: The protocol could get most of the data across,
and just not worry about the occasional gap. Once the client has a mostly-
complete data set, it could use a connected point-to-point protocol to fill
in the gaps. Rsync, for example, is very good at filling small gaps in
otherwise complete data sets. (True, this is point-to-point, and partially
defeats the purpose of using multicast, but since it's only used for
relatively tiny parts of the data set, the connections should be short-lived
and relatively few in number.)
Does any of this make sense?This is definitely a job for Swarmcsat. It avoids most of the problems people have identified, although extensive firewalling can still undermine it.
Alternatively, Gnutella or eDonkey like programs can be used.
sigs are a waste of space
Very cool stuff Justin!
My question for you is: Will it work over the Internet as is now, or do all the routers in between the source and the destinations have to be specially set up to handle the multicasting traffic? I did a number of multicast experiments a couple years ago and found multicast to be unusable over the net because the routers dropped all the packets.
--jeff
ipv6 is my vpn