Interview with Bruce Maggs
Mihai Budiu sent in this interview with Bruce Maggs, a computer scientist who used to work at Akamai, the company which caches content for a great many popular websites. An interesting look at the combination of solving research problems and starting up a new company.
Akamai shares a block with the MIT Laboratory for Computer Science. Recently, there was a despicable, unprovoked snowball attack on innocent MIT graduate students by Akamai customer care thugs. (Well, okay, there's a little more to the story... :-) But anyway, differences will be settled in a mathematical/theoretical computer science shootout on the evening of April 3. Should be fun.
This is completely bass-ackwards. The content that becomes more popular becomes harder to get, even though many, many more copies are made available. If said server sends out these 1000 copies of a file, why can't some of the clients share those 1000 copies?
Potential solutions to this problem can be derrived from systems that have already found a way around it, such as Gnutella and any MCAST implementation.
Gnutella, although its network model has other problems, allieviates the previously mentioned problem by forcing (or suggesting that) all clients cache and share for redistribution any content they download, thus increasing the number of available copies. MCAST, and other streaming technologies, handle the problem by allowing the server to send one copy of the content that can be shared by many clients... this is why we don't have to wait for TV/Radio shows to download.
The problem with universally applying an MCAST-type solution to the internet is that the internet is not like TV and radio: the internet is supposed to be content-on-demand. If you turn on your TV five minutes before a show, you can't start watching it early; simlarily, if you tune in five minutes late you can't start back at the beginning (TVIO users aside). I think many /. readers would go into shock if they could only read slashdot on the hour, every hour. (Sidenote: one potential workaround for really busy sites is to broadcast the data every x number of seconds continuously, that way the data restarts often enough. The problem with this is that users with slower connections won't be able to keep up, and users with faster connections will be limited to whatever the server's streaming at. Also, the server will keep broadcasting regardless of what sort of traffic it gets, clogging up its bandwith).
Gnutella is a much better solution. I'm not going to try to work out the details, but stick with me for the big picture. When a user hits a webpage, even with the current model, all of the content is cached on the local hard drive, or sometimes somewhere in between the user and the server. What if everyone's browser was capable of serving requests for that cached data? This would not be efficient for sites with only a little traffic, but for /.ted sites or CNN and the like, it would work very well. The problem is finding another client that has the data you want cached, this might be resolveable using either peering groups (like routers and gnutella), or using a central server to track it all (like napster). This however gives bad users a chance to replace CNN's banner with their own ads etc, but this could perhaps be worked around with some sort of trust metric system?
Well, there's my two cents, sorry if it's incoherent.