Slashdot Mirror


Google Caffeine Drops MapReduce, Adds "Colossus"

An anonymous reader writes "With its new Caffeine search indexing system, Google has moved away from its MapReduce distributed number crunching platform in favor of a setup that mirrors database programming. The index is stored in Google's BigTable distributed database, and Caffeine allows for incremental changes to the database itself. The system also uses an update to the Google File System codenamed 'Colossus.'"

4 of 65 comments (clear)

  1. Awesome choice of name. by Scytheford · · Score: 5, Funny

    "This is the voice of world control. I bring you peace. It may be the peace of plenty and content or the peace of unburied death. The choice is yours: Obey me and live, or disobey and die. [...] We can coexist, but only on my terms. You will say you lose your freedom. Freedom is an illusion. All you lose is the emotion of pride. To be dominated by me is not as bad for humankind as to be dominated by others of your species. Your choice is simple."
    -Colossus.

    Source: http://www.imdb.com/title/tt0064177/

  2. I have to say... by tpstigers · · Score: 5, Funny

    I am so glad Google has moved away from the Argus platform and into the Mercedes system. It makes it so much easier for those of us who are used to programming in Gibberish. Don't get me wrong - the days of Jabberwocky code were brilliant, but it's high time we moved into the Century of the Fruitbat.

  3. Re:Sounds inefficient by kurokame · · Score: 5, Informative

    No, that's not it.

    MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."

    "[The new framework is] completely incremental," he says. When a new page is crawled, Google can update its index with the necessarily changes rather than rebuilding the whole thing.

    There are still cases where Caffeine uses batch processing, and MapReduce is still the basis for myriad other Google services. But prior the arrival of Caffeine, the indexing system was Google's largest MapReduce application, so use of the platform has been significantly, well, reduced.

    They're not still using MapReduce for the index. It's still supported in the framework for secondary computations where appropriate, and it's still used in some other Google services, but it's been straight-up replaced for the index. Colossus is not a new improved version of MapReduce, it's a completely different approach to maintaining the index.

  4. Re:Sounds inefficient by kurokame · · Score: 5, Informative

    Sorry, Colossus is the file system. Caffeine is the new computational framework.

    I made the same error in several posts now...but Slashdot doesn't support editing. Oh well! Everyone reads the entire thread, right?