Software Distribution via Multicast?
RockyMountain asks: "When it took me over 24 hours to download the latest Mandrake ISOs, I got to wondering...why do we still put up with servers overloaded with zillions of simultaneous TCP connections, all sending copies of the same thing?Hasn't multicasting evolved to the point where there's a better way? A quick look at Freshmeat turned up no obvious candidates. Are there any protocols or programs for distributing software via multicast? Are there any evolving standards? Or are there fundamental problems with this approach that I am overlooking?" An interesting question. With my limited understanding of Multicast, I would think that, at the very least, if you are a software distribution site you can have software distribution "channels", where each channel serves one piece of software. Milticast clients wanting a specific piece of software would connect to the right channel and wait until the next time it starts serving the software from the beginning (or, in the case of an interrupted connection, when the channel gets to the appropriate resume point). Might such a system be ideal for multicast? Can any of you come up with others?
There are a few problems with multicast. First, of all having the right hardware in line to handle multicast packets and get them all the way to the end user (routers, firewalls, etc.) Secondly, finding the right content to send. This is a big one. People want the content they want, when they want it. They don't feel like waiting, even if it would be more efficient.
But a multicast model, is worth a look. On September 11th, it was, for the most part, a multicast model (broadcast tevevision) that got us our information. Most of the news web sites could not handle the unicast method. So, its a good idea, but like you said, not that much seems to be going on with it. I'd love to here some good, real-world implementations of multicast.
KidA
"Karma can only be portioned out by the cosmos." -Homer Simpson
SQUIGGY, genuine New Jersey Oi!
What multicast is very good for is replicating installs. If you want to burn one image to every disk in a room full of computers you can easily start the download client on every computer and then start the multicast session.
Overall, multicast could be useful if anyone actually wrote convenient software to serve and receive it, and for the geek crowd that downloads distribution .iso files it actualy might work, but the normal internet public has enough issues with the current download-on-demand thing to be bothered with multicast downloading
For video and audio broadcasts it's just ideal. With one simple cable/dsl connection *anyone* can become an internet radio/TV.
How is the state of the multicast capabilities throughout the net? Do ISP's use it? Do they let their clients use it?
Pedro Côrte-Real.
The biggest problem I see with a multicast method for ISO images isn't the varying start times, but the varying connection speeds of users.
Multicast video works because there is only X number of bits needed at any one time. Aka, a 24k video stream only needs 24k of pipe to work, having a T1 down won't help you get the video faster. But having a T1 will help you download an ISO image faster so you can start the install process.
Also streaming video and such does not require a perfect stream... if a piece is missing it just ignores it and goes on its way. But an ISO image needs to be perfect. If not you just made a nice coaster for your coffee cup.
The only way I see it working is if everyone agreed to download at the speed of the slowest link. And I'm not going to agree to let my DSL line go to waste so I can download at the 33.6k of the dialup user who wants to wait 4 days for a download. Also having to be perfect would require the server to resend anytime a client reported a lost or corrupted packet. One needs only to be familiar with Norton Ghost and a lab with one bad NIC or HDD to see the crawl this will result in till the bad box times out.
So while nice in theory I doubt it would have much benefit outside of a controlled lab environment where everyone is on the same high-speed connection and there is very little loss of packets.
iRepairIT - iPhone, Mac, & PC Repair
For things like .iso files, where the user needs the whole file before they can use any of it, there is no need to start from the begining of the file.
Server sends the file over and over again, as long as there is at least one member of the mulitcast group.
Clients join the mulitcast group and start recording wherever in the file they happen to find themselves. When the file stream ends, it simply starts from the beginning again and the client procedes to capture the part that it missed the first time around before disconnecting.
Will Dyson
"We can't stop here
Look at bitTorrent.
http://bitconjurer.org/BitTorrent/
It's not multicast per se, but seeks to avoid the horrific inefficiencies you've noted.
You could think of it as inspired by mojoNation, but it's a different architecture focusing on a different problem.
I always thought that multicasting software would be ideal not to the client, but to proxy servers. The proxy would think send out the files to a smaller group of people "the old fasioned way". I guess it's really nothing more than a fancy mirroring system...
I'm no expert on network protocols. I'm not even a software guy. So some of what follows may seem very naive. Bear with me and see if this makes sense. Here's how I see it working.
Data Rate. The server would send several streams at once on several channels, each one paced for a different data rate. For example, the T1 user would pick a different channel than the 28K modem user. Each channel endlessly repeats the same data set, over & over.
Keeping Track. Each datagram sent would contain an offset value that shows where it fits into the big picture. Thus, the client knows which parts of the whole have been received, and which ones have not. As we shall see, this helps deal with start time synchronisation and dropped packet issues.
Start Time. You don't even try to synchronise start times. If a client connects in the middle, so what. It just stores the second half of the data set, then stays on the line for the next repetition of the first half. The client knows when it has received the whole data set, because each datagram is tagged with an offset that shows where it fits into the big picture.
Missed Packets. This is the hard part. If a client misses a packet because it is dropped en- route, or for whatever reason, there are a few ways to deal with it.
- The client could just wait for the next iteration of the data set, and
listen for the datagrams that fill in the blanks.
- The protocol could use a UDP backchannel which allows clients to request
retransmissions of datagrams by offset. The server could keep track of which
datagrams have been requested, and periodically retransmit those datagrams
out of sequence. If there are too many, and forward progress is threatened,
the server could keep a histogram of which packet have been requested most
often, and resend the most-requested ones only -- let the others wait for the
next iteration.
- My favorite approach: The protocol could get most of the data across,
and just not worry about the occasional gap. Once the client has a mostly-
complete data set, it could use a connected point-to-point protocol to fill
in the gaps. Rsync, for example, is very good at filling small gaps in
otherwise complete data sets. (True, this is point-to-point, and partially
defeats the purpose of using multicast, but since it's only used for
relatively tiny parts of the data set, the connections should be short-lived
and relatively few in number.)
Does any of this make sense?Opencola has a comerical and opensource solution for a simliar problem. http://www.opencola.org:8080/
This is definitely a job for Swarmcsat. It avoids most of the problems people have identified, although extensive firewalling can still undermine it.
Alternatively, Gnutella or eDonkey like programs can be used.
sigs are a waste of space
I remembered this from the dead-tree edition, and luckily it's one of the articles that has full text available online.
Check it out here...
The Attitude Adjuster, I hate me, you can too.
Because you're downloading Mandrake, this might not be useful for you. I'm posting this in case it might help some Windows admins out there. Intel makes a product called LANDesk Management Suite that does multicast software distribution. There are two things that I should explain at this point. 1) In the current version (6.4) of the product, multicast software distribution is an add-on that must be purchased seperately. 2) It is mis-named because it doesn't use multicast packets.
The way it works is that the server will send a command to one node on each subnet telling them to fetch the software from a specified location. Then the rest of the nodes will be given a command telling them to fetch the software from the designated computer on their subnet. So it should be called a multi-tiered distribution instead of multicast distribution. It works well and is worth looking into if you have to do this sort of task all the time.
Just after I posted my last message, I found out that Intel is releasing version 6.5, which will include the multicasting add-on for no additional fee.
Hi, I am Justin Chapweske, the inventor of OpenCola's Swarmcast. I am now working on another software project to specifically address the needs of content distribution over multicast, and the Onion Networks FEC Library is the first step in building that soluiton. The FEC library will provide the foundation of our future open source multicast content distribution software, so keep an eye out at http://onionnetworks.com for more info.
The cornerstone technology to any reliable multicast system is FEC (Forward Error Correction) which is an encoding technique that can repair lost or corrupt packets.
We at Onion Networks have created a very solid FEC library that will form the foundation of our open source implementations of the reliable multicast protocols. The FEC library can be had at http://onionnetworks.com/components.html