Slashdot Mirror


Interview with Bruce Maggs

Mihai Budiu sent in this interview with Bruce Maggs, a computer scientist who used to work at Akamai, the company which caches content for a great many popular websites. An interesting look at the combination of solving research problems and starting up a new company.

30 of 58 comments (clear)

  1. Here goes... by Anonymous Coward · · Score: 2

    And all the CS students at CMU doing their Graphics projects start wondering:

    What the hell happened to the CS server..? I can't get anywhere! Bleh, might as well read Slashdot...

    Oh, *that's* what happened to the CS server...

  2. Aren't you thinking of Freenet? by Sanity · · Score: 2
    Freenet does largely what you describe, in a logarithmically scalable manner (which differenciates it from Gnutella which isn't very scalable). Freenet caches data automatically, moving it closer to demand, and replicating popular data, where as Gnutella only shares what is already on your machine. If you are interested in learning more I suggest reading this paper.

    --

  3. screwed up the link by crisco · · Score: 2
    should be w3.org.

    preview is your friend.

    Chris Cothrun
    Curator of Chaos

    --

    Bleh!

  4. uh, I hate to point out... by crisco · · Score: 2
    ... that every image you're downloading is compressed in the first place. Yes, GIF, JPEG and even PNG are compression formats. Your browser has to allocate memory for the downloaded compressed file and a larger chunk of memory for the flattened bitmap.

    The benefits of a compression system in HTTP 1.1 (look elsewhere for my post with links about this) are as much in the reduction of TCP connection creation and the transfer of the images in a page in one big chunk instead of lots of little requests.

    Think real hard for a miniute. The few hundred K or less of HTML and images on an average web page being sucked through a 56K modem are going to be much slower than even virtual memory from a swap file! Memory and processor speed are the last of your considerations.

    Chris Cothrun
    Curator of Chaos

    --

    Bleh!

  5. Already an optional part of HTTP 1.1 by crisco · · Score: 3
    More information at the w3.org page on 'pipelining'.

    Apparantly the improvements span more than just compressing stuff. HTTP 1.1 has provision for maintaining a TCP connection for the duration of the transfer of page and page elements instead of creating a new TCP connection for each page element.

    Scroll down about halfway for the tables. A quick glance shows that compression works best for low bandwidth connections (naturally) and that the other improvements also made a difference.

    Chris Cothrun
    Curator of Chaos

    --

    Bleh!

  6. Re:Akamai vs. MIT by jaffray · · Score: 3
    Recently, there was a despicable, unprovoked snowball attack on innocent MIT graduate students by Akamai customer care thugs.
    "Unprovoked"? So, your ragtag little band of punks just happened to tromp out an insult in the snow outside our office while randomly wandering around building snowmen? I think not.

    Gentle readers of Slashdot, do not let yourselves be deceived by the ravings of these pathological liars in LCS, the rotting remains of a once-great department, the dregs left behind when the real talent left to form Akamai. Read the full story and decide for yourself.

  7. Clearly you are correct. by FallLine · · Score: 2

    In fact, they're trying to determine the optimal first move for Tic Tac Toe down the exact box. I bet you never knew there was so much math involved, eh?

  8. Patent Lawsuit by augustz · · Score: 2
    Why no question about the patent issues Akamai has been stirring up with Digital Island. You'd think this would directly impact acadamia and I'd have been interested in hearing the answer from an academic who worked at Akamai.

    Anyone with good thoughts? Is there a justification for the Akamai patent rattling, has their fight with digital island been resolved? We were going to go with them for some caching but pulled out because of their patent position. Would love to find out that has become a moot point.

  9. Why Akamai does and does not use Linux by WeeGadget · · Score: 3
    From the interview:
    It is true that most of Akamai's servers are Linux servers. However we also run a large number of Windows 2000 servers, in particular the servers delivering Windows Media format.
    More evidence that proprietary File Formats and Protocols/APIs are the two tracks that carry the MS Monopoly Railroad forward.

    I know it's been said before, but it's worth saying again -- The way to increase the market share of alternate OSes is not to persuade users to install and use Linux. The way is to persuade users to use open File Formats and Protocols/APIs. Diversification of the OS market place will follow as a natural consequence.

    In the example above, when Akamai needed to deliver the open file formats and protocols of the Internet, they had several choices. They decided that Linux best suited thier needs. But when they needed to stream Windows Media, Win2000 was their only realistic choice.

    I may be a pessimist... but I fear that WMF is a problem that Open Source cannot overcome. Even if we achieved the tremendous feat of catching up with a patent free CODEC and streaming protocol that is comparable to ASF/WMF, we still would not have success. Big Media thinks OSS is evil -- and MS will pander to Big Media's obsession with total IP control.

    I hate to be gloomy, but I think that ASF/WMF is the first viable long-term Internet wedge for MS. I think .NET will be the second, and more are sure to follow.

    The future just does'nt look bright for alternate OSes from my POV... But then thats just my opinion... I could be wrong!

    Jonathan Weesner

    Level D Flight Simulators using Linux from NLX Corporation. That's my idea of FUN!

    1. Re:Why Akamai does and does not use Linux by steveha · · Score: 2
      I fear that WMF is a problem that Open Source cannot overcome.

      I have high hopes for Ogg Vorbis.

      We /. types like it because it's free. Big Business will like it because they will never have to pay anything to use it. The only people who won't like it will be the ones who want to lock up the music, but in the long run they are doomed to fail.

      (Given a choice between paying for music in WMF format and paying for music in a CD format, I will buy the CD every time. I predict that enough other people will do the same to ensure that WMF never takes over the world.)

      steveha

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
  10. Maggs-neto by Cort · · Score: 3

    As it says in the interview, Bruce Maggs is a professor at Carnegie Mellon. I was in a discrete math course that he taught about three years ago, and one of my classmates produced this comic-book-style look at what "Maggs-neto" does with his spare time (namely, plot world domination with the aid of a mind-controlled pack of Spice Girls). Bruce was a good sport about the whole thing -- images and references to the comic's story began appearing in his lecture notes & slides! Sadly, it was never finished...

  11. ACM account required by harmonica · · Score: 2

    Without one you cannot download the paper... The description sounds interesting, though.

  12. Wow. by generic-man · · Score: 2

    It's so weird to load Slashdot, look at the top article, and think, "Hey, that's my professor for 213."

    Take that, MIT!

    --
    For more information, click here.
  13. already done. by gargle · · Score: 2

    What if everyone's browser was capable of serving requests for that cached data? This would not be efficient for sites with only a little traffic, but for /.ted sites or CNN and the like, it would work very well. The problem is finding another client that has the data you want cached, this might be resolveable using either peering groups (like routers and gnutella), or using a central server to track it all (like napster).

    There're tons of companies/groups working on variations of the same idea. To name a few:
    swarmcast, allcast, etc. So far none of them have taken off. I'll leave it as an exercise to the reader to figure out why.

  14. Akamai vs. MIT by e_lehman · · Score: 5

    Akamai shares a block with the MIT Laboratory for Computer Science. Recently, there was a despicable, unprovoked snowball attack on innocent MIT graduate students by Akamai customer care thugs. (Well, okay, there's a little more to the story... :-) But anyway, differences will be settled in a mathematical/theoretical computer science shootout on the evening of April 3. Should be fun.

  15. Re:Methods of Caching the Internet by zaius · · Score: 2
    Firstly, most internet users are still on those slow dialups.

    Don't include them, or give them a lower priority.

    ...unless they have some kind of similar client, you're just going to be sitting their aimlessly

    If they don't have the client (I imagined it as a browser plugin, but it could be an OS feature, actually, if it's windows the plugin is an os feature ;) ), then they wouldn't be on the 'list', so to speak.

    Thirdly, you would be using the other person's (the hosts') upload bandwidth, and bandwidth is something no one wants to sacrifice.

    Yes, but it's upstream bandwidth. How much upstream bandwidth does the average 'net user utilize each day?

  16. Re:Methods of Caching the Internet by zaius · · Score: 2
    Yes, some sort of security would be needed... maybe a trust system where AOL and the like get first priority, then another ring of users etc... joe-schmuck on his 14.4 gets the lowest pritority...

    Uplink bandwidth is limited, but it's still faster than some sites I've seen slashdotted...

  17. Methods of Caching the Internet by zaius · · Score: 5
    Akamai is just one example of different systems people have come up with for working around the inherent flaws of the internet (which are clearly demonstrated by the "Slashdot Effect"). The problem is, everyone wants to look at the same content at the same time; under the current system, the server has to send out one copy of the data to each client that requests it, so if 1000 clients request it, the server has to send 1000 copies.

    This is completely bass-ackwards. The content that becomes more popular becomes harder to get, even though many, many more copies are made available. If said server sends out these 1000 copies of a file, why can't some of the clients share those 1000 copies?

    Potential solutions to this problem can be derrived from systems that have already found a way around it, such as Gnutella and any MCAST implementation.

    Gnutella, although its network model has other problems, allieviates the previously mentioned problem by forcing (or suggesting that) all clients cache and share for redistribution any content they download, thus increasing the number of available copies. MCAST, and other streaming technologies, handle the problem by allowing the server to send one copy of the content that can be shared by many clients... this is why we don't have to wait for TV/Radio shows to download.

    The problem with universally applying an MCAST-type solution to the internet is that the internet is not like TV and radio: the internet is supposed to be content-on-demand. If you turn on your TV five minutes before a show, you can't start watching it early; simlarily, if you tune in five minutes late you can't start back at the beginning (TVIO users aside). I think many /. readers would go into shock if they could only read slashdot on the hour, every hour. (Sidenote: one potential workaround for really busy sites is to broadcast the data every x number of seconds continuously, that way the data restarts often enough. The problem with this is that users with slower connections won't be able to keep up, and users with faster connections will be limited to whatever the server's streaming at. Also, the server will keep broadcasting regardless of what sort of traffic it gets, clogging up its bandwith).

    Gnutella is a much better solution. I'm not going to try to work out the details, but stick with me for the big picture. When a user hits a webpage, even with the current model, all of the content is cached on the local hard drive, or sometimes somewhere in between the user and the server. What if everyone's browser was capable of serving requests for that cached data? This would not be efficient for sites with only a little traffic, but for /.ted sites or CNN and the like, it would work very well. The problem is finding another client that has the data you want cached, this might be resolveable using either peering groups (like routers and gnutella), or using a central server to track it all (like napster). This however gives bad users a chance to replace CNN's banner with their own ads etc, but this could perhaps be worked around with some sort of trust metric system?

    Well, there's my two cents, sorry if it's incoherent.

    1. Re:Methods of Caching the Internet by srichman · · Score: 2
      The problem with universally applying an MCAST-type solution to the internet is that the internet is not like TV and radio: the internet is supposed to be content-on-demand. If you turn on your TV five minutes before a show, you can't start watching it early; simlarily, if you tune in five minutes late you can't start back at the beginning.

      Digital Fountain seeks to solve this problem.

    2. Re:Methods of Caching the Internet by dachshund · · Score: 2

      This is exactly what Akamai does, only the ISPs don't actually get to run the servers (thus eliminating the mess that would invariably result) and providing powerful revenue opportunities to a plucky little Boston startup.

    3. Re:Methods of Caching the Internet by dachshund · · Score: 2
      Not to mention that any client could simply lie about the content it has: "Yes, I've got slashdot.org-- ok, it may look like a bunch of porn links, but..."

      The only way to solve that is to have some way of verifying content, maybe a signature or something, but then you've got to have a third party signing everything. This is all aside from the problem of a publisher needing to modify a web page once released (a big one.)

      And of course, uplink bandwidth is very limited on the majority of DSL/Cable systems.

  18. Re:questions? by iritant · · Score: 2

    Mostly Akamai is in the image business, since images have been shown to take up most bandwidth (in some cases up to 85%). The reasons they decrease download times is two-fold: they're probably physically closer to the client than the source otherwise would be. Second, they probably have more bandwidth.

    Even so, you could be right. The overhead shifts from the image download to the DNS. Thus it wouldn't make sense for Joe Homeuser to "akamaize", but it does for Yahoo and CNN simply because there are so many people over a such a diverse area attempting to retrieve their pages.

    By the way, the estute will notice that the diagram in that article is wrong. The client contacts the client name server, which then will contact Akamai's name servers. This means that the DNS optimization is the client name server and not the client itself.

  19. Good Akamai interview by ruck · · Score: 2

    Here's an older (and shorter) interview (from MIT's Technology Review) with Tom Leighton, the guy who cofounded Akamai. The article is titled "Akamai's algorithms" and it treats many of the same topics mentioned in the post.

  20. questions? by loucura! · · Score: 2

    Is akamai caching websites, or are they serving images for websites? If they are caching the websites, how does that increase the speed of download for a specific website? A mirror may help remove the load off a server, but the end-user still is downlink from any bottlenecks from any system. Especially the original system that is serving cached webpages through Akamai, as the original server is handling all requests, and still has to pass them on.

    If Akamai is serving images for the websites, doesn't that increase the download time, (albeit not considerably in a theoretical, perfectly stable connection) as the end-user is being "served" from multiple systems.

    If I understood the portion of the interview pointing at Akamai correctly, the system is only good for the servers. The end user is making multiple, simultaneous requests for the page from several different servers, this should (technically) bring into account bottlenecks between the systems.

    Of course, the practice is used all the time via doubleclick and the other ad agencies, and page time isn't to difficult to contend with (I assume) on a non-broadband connection, but when one introduces advertisements, downloading the images, and getting any server database calls from MULTIPLE servers, the backup is potentially paralyzing...

    --
    Black and grey are both shades of white.
  21. Re:Images are not the only bandwidth hog... by leviramsey · · Score: 2

    I guess in a related note, would it be possible to design a system where all the data on a page is compressed (say, into a bzipped tarbal) and decompressed by the client? How much power would be required to do multiple extremely quick bzips?

  22. Nope. by Maldivian · · Score: 3

    The picture is from This page. Which describes their network tech. here is the orginal picture.

    Enjoy

    --
    Trust the source!
  23. dont shoot the messenger by deran9ed · · Score: 2

    The article started nicely but then it went in to a flurry of Akamai marketing BS.

    Sure Akamai does some neat stuff, but so does a company called Edgix, which does it via satellite to an ISP bypassing the need to go through hops upon hops of information. What I found neat about Edgix' technology was (although this post sounds like a marketing ploy) they sell their caching servers which poll the most sought after websites' content then cache it hourly, daily, whatever. Then when someone looks something up, it pulls it directly off of the ISP's server which means faster content delivery.

    But you don't see me interviewing their staff in attempts for them to flood an article while masquerading as an interview do you?

    Not only that but it does this on a satellite based mechanism which means if Globix, UUNet, Exodus, Level3 all blow up, you'll still get a cached slashdot without routes being broken, and a slew of timeout errors.

    Well... At least I got to see where he went to school though, such an informative interview.

    Toy truck thieves still at large

  24. well... by deran9ed · · Score: 2

    While you worry about a game, I worry about NASDAQ, IPO's, fractions when the bell tolls on Wall Street, so while my ISP delivers the content I need, and my bank account gets heavier, keep fragging on.

    Different strokes for different folks I guess.

    Now that you mention it though, I'd like to see how your solution would fly when on a business trip on an airplane. Oh those telco wires at 30,000 feet, how fast they zoom that data through don't they

  25. Re:This is obviously a hoax by dachshund · · Score: 2

    Clearly Akamai was high on stock when they built that place. I'd be willing to bet that if they had it to do all over again (with a $10 and falling stock price), that room would consist of a large pull-down atlas, three DECStations and an old Mac Plus.

  26. This is obviously a hoax by sagacious_gnostic · · Score: 3

    That picture of the monitoring system is taken directly from the movie "War Games".

    The article is an obvious attempt to obscure their real purpose; to establish a world wide tic-tac-toe solving distributed supercomputer.