Interview with Bruce Maggs
Mihai Budiu sent in this interview with Bruce Maggs, a computer scientist who used to work at Akamai, the company which caches content for a great many popular websites. An interesting look at the combination of solving research problems and starting up a new company.
And all the CS students at CMU doing their Graphics projects start wondering:
What the hell happened to the CS server..? I can't get anywhere! Bleh, might as well read Slashdot...
Oh, *that's* what happened to the CS server...
--
preview is your friend.
Chris Cothrun
Curator of Chaos
Bleh!
The benefits of a compression system in HTTP 1.1 (look elsewhere for my post with links about this) are as much in the reduction of TCP connection creation and the transfer of the images in a page in one big chunk instead of lots of little requests.
Think real hard for a miniute. The few hundred K or less of HTML and images on an average web page being sucked through a 56K modem are going to be much slower than even virtual memory from a swap file! Memory and processor speed are the last of your considerations.
Chris Cothrun
Curator of Chaos
Bleh!
Apparantly the improvements span more than just compressing stuff. HTTP 1.1 has provision for maintaining a TCP connection for the duration of the transfer of page and page elements instead of creating a new TCP connection for each page element.
Scroll down about halfway for the tables. A quick glance shows that compression works best for low bandwidth connections (naturally) and that the other improvements also made a difference.
Chris Cothrun
Curator of Chaos
Bleh!
Gentle readers of Slashdot, do not let yourselves be deceived by the ravings of these pathological liars in LCS, the rotting remains of a once-great department, the dregs left behind when the real talent left to form Akamai. Read the full story and decide for yourself.
In fact, they're trying to determine the optimal first move for Tic Tac Toe down the exact box. I bet you never knew there was so much math involved, eh?
Anyone with good thoughts? Is there a justification for the Akamai patent rattling, has their fight with digital island been resolved? We were going to go with them for some caching but pulled out because of their patent position. Would love to find out that has become a moot point.
I know it's been said before, but it's worth saying again -- The way to increase the market share of alternate OSes is not to persuade users to install and use Linux. The way is to persuade users to use open File Formats and Protocols/APIs. Diversification of the OS market place will follow as a natural consequence.
In the example above, when Akamai needed to deliver the open file formats and protocols of the Internet, they had several choices. They decided that Linux best suited thier needs. But when they needed to stream Windows Media, Win2000 was their only realistic choice.
I may be a pessimist... but I fear that WMF is a problem that Open Source cannot overcome. Even if we achieved the tremendous feat of catching up with a patent free CODEC and streaming protocol that is comparable to ASF/WMF, we still would not have success. Big Media thinks OSS is evil -- and MS will pander to Big Media's obsession with total IP control.
I hate to be gloomy, but I think that ASF/WMF is the first viable long-term Internet wedge for MS. I think .NET will be the second, and more are sure to follow.
The future just does'nt look bright for alternate OSes from my POV... But then thats just my opinion... I could be wrong!
Jonathan Weesner
Level D Flight Simulators using Linux from NLX Corporation. That's my idea of FUN!
As it says in the interview, Bruce Maggs is a professor at Carnegie Mellon. I was in a discrete math course that he taught about three years ago, and one of my classmates produced this comic-book-style look at what "Maggs-neto" does with his spare time (namely, plot world domination with the aid of a mind-controlled pack of Spice Girls). Bruce was a good sport about the whole thing -- images and references to the comic's story began appearing in his lecture notes & slides! Sadly, it was never finished...
Without one you cannot download the paper... The description sounds interesting, though.
It's so weird to load Slashdot, look at the top article, and think, "Hey, that's my professor for 213."
Take that, MIT!
For more information, click here.
What if everyone's browser was capable of serving requests for that cached data? This would not be efficient for sites with only a little traffic, but for /.ted sites or CNN and the like, it would work very well. The problem is finding another client that has the data you want cached, this might be resolveable using either peering groups (like routers and gnutella), or using a central server to track it all (like napster).
There're tons of companies/groups working on variations of the same idea. To name a few:
swarmcast, allcast, etc. So far none of them have taken off. I'll leave it as an exercise to the reader to figure out why.
Akamai shares a block with the MIT Laboratory for Computer Science. Recently, there was a despicable, unprovoked snowball attack on innocent MIT graduate students by Akamai customer care thugs. (Well, okay, there's a little more to the story... :-) But anyway, differences will be settled in a mathematical/theoretical computer science shootout on the evening of April 3. Should be fun.
Don't include them, or give them a lower priority.
If they don't have the client (I imagined it as a browser plugin, but it could be an OS feature, actually, if it's windows the plugin is an os feature ;) ), then they wouldn't be on the 'list', so to speak.
Thirdly, you would be using the other person's (the hosts') upload bandwidth, and bandwidth is something no one wants to sacrifice.
Yes, but it's upstream bandwidth. How much upstream bandwidth does the average 'net user utilize each day?
Uplink bandwidth is limited, but it's still faster than some sites I've seen slashdotted...
This is completely bass-ackwards. The content that becomes more popular becomes harder to get, even though many, many more copies are made available. If said server sends out these 1000 copies of a file, why can't some of the clients share those 1000 copies?
Potential solutions to this problem can be derrived from systems that have already found a way around it, such as Gnutella and any MCAST implementation.
Gnutella, although its network model has other problems, allieviates the previously mentioned problem by forcing (or suggesting that) all clients cache and share for redistribution any content they download, thus increasing the number of available copies. MCAST, and other streaming technologies, handle the problem by allowing the server to send one copy of the content that can be shared by many clients... this is why we don't have to wait for TV/Radio shows to download.
The problem with universally applying an MCAST-type solution to the internet is that the internet is not like TV and radio: the internet is supposed to be content-on-demand. If you turn on your TV five minutes before a show, you can't start watching it early; simlarily, if you tune in five minutes late you can't start back at the beginning (TVIO users aside). I think many /. readers would go into shock if they could only read slashdot on the hour, every hour. (Sidenote: one potential workaround for really busy sites is to broadcast the data every x number of seconds continuously, that way the data restarts often enough. The problem with this is that users with slower connections won't be able to keep up, and users with faster connections will be limited to whatever the server's streaming at. Also, the server will keep broadcasting regardless of what sort of traffic it gets, clogging up its bandwith).
Gnutella is a much better solution. I'm not going to try to work out the details, but stick with me for the big picture. When a user hits a webpage, even with the current model, all of the content is cached on the local hard drive, or sometimes somewhere in between the user and the server. What if everyone's browser was capable of serving requests for that cached data? This would not be efficient for sites with only a little traffic, but for /.ted sites or CNN and the like, it would work very well. The problem is finding another client that has the data you want cached, this might be resolveable using either peering groups (like routers and gnutella), or using a central server to track it all (like napster). This however gives bad users a chance to replace CNN's banner with their own ads etc, but this could perhaps be worked around with some sort of trust metric system?
Well, there's my two cents, sorry if it's incoherent.
Mostly Akamai is in the image business, since images have been shown to take up most bandwidth (in some cases up to 85%). The reasons they decrease download times is two-fold: they're probably physically closer to the client than the source otherwise would be. Second, they probably have more bandwidth.
Even so, you could be right. The overhead shifts from the image download to the DNS. Thus it wouldn't make sense for Joe Homeuser to "akamaize", but it does for Yahoo and CNN simply because there are so many people over a such a diverse area attempting to retrieve their pages.
By the way, the estute will notice that the diagram in that article is wrong. The client contacts the client name server, which then will contact Akamai's name servers. This means that the DNS optimization is the client name server and not the client itself.
Here's an older (and shorter) interview (from MIT's Technology Review) with Tom Leighton, the guy who cofounded Akamai. The article is titled "Akamai's algorithms" and it treats many of the same topics mentioned in the post.
Is akamai caching websites, or are they serving images for websites? If they are caching the websites, how does that increase the speed of download for a specific website? A mirror may help remove the load off a server, but the end-user still is downlink from any bottlenecks from any system. Especially the original system that is serving cached webpages through Akamai, as the original server is handling all requests, and still has to pass them on.
If Akamai is serving images for the websites, doesn't that increase the download time, (albeit not considerably in a theoretical, perfectly stable connection) as the end-user is being "served" from multiple systems.
If I understood the portion of the interview pointing at Akamai correctly, the system is only good for the servers. The end user is making multiple, simultaneous requests for the page from several different servers, this should (technically) bring into account bottlenecks between the systems.
Of course, the practice is used all the time via doubleclick and the other ad agencies, and page time isn't to difficult to contend with (I assume) on a non-broadband connection, but when one introduces advertisements, downloading the images, and getting any server database calls from MULTIPLE servers, the backup is potentially paralyzing...
Black and grey are both shades of white.
I guess in a related note, would it be possible to design a system where all the data on a page is compressed (say, into a bzipped tarbal) and decompressed by the client? How much power would be required to do multiple extremely quick bzips?
The picture is from This page. Which describes their network tech. here is the orginal picture.
Enjoy
Trust the source!
The article started nicely but then it went in to a flurry of Akamai marketing BS.
Sure Akamai does some neat stuff, but so does a company called Edgix, which does it via satellite to an ISP bypassing the need to go through hops upon hops of information. What I found neat about Edgix' technology was (although this post sounds like a marketing ploy) they sell their caching servers which poll the most sought after websites' content then cache it hourly, daily, whatever. Then when someone looks something up, it pulls it directly off of the ISP's server which means faster content delivery.
But you don't see me interviewing their staff in attempts for them to flood an article while masquerading as an interview do you?
Not only that but it does this on a satellite based mechanism which means if Globix, UUNet, Exodus, Level3 all blow up, you'll still get a cached slashdot without routes being broken, and a slew of timeout errors.
Well... At least I got to see where he went to school though, such an informative interview.
Toy truck thieves still at large
360 degrees of Karma
While you worry about a game, I worry about NASDAQ, IPO's, fractions when the bell tolls on Wall Street, so while my ISP delivers the content I need, and my bank account gets heavier, keep fragging on.
Different strokes for different folks I guess.
Now that you mention it though, I'd like to see how your solution would fly when on a business trip on an airplane. Oh those telco wires at 30,000 feet, how fast they zoom that data through don't they
360 degrees of Karma
Clearly Akamai was high on stock when they built that place. I'd be willing to bet that if they had it to do all over again (with a $10 and falling stock price), that room would consist of a large pull-down atlas, three DECStations and an old Mac Plus.
That picture of the monitoring system is taken directly from the movie "War Games".
The article is an obvious attempt to obscure their real purpose; to establish a world wide tic-tac-toe solving distributed supercomputer.