Slashdot Mirror


MapReduce — a Major Step Backwards?

The Database Column has an interesting, if negative, look at MapReduce and what it means for the database community. MapReduce is a software framework developed by Google to handle parallel computations over large data sets on cheap or unreliable clusters of computers. "As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is: a giant step backward in the programming paradigm for large-scale data intensive applications; a sub-optimal implementation, in that it uses brute force instead of indexing; not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago; missing most of the features that are routinely included in current DBMS; incompatible with all of the tools DBMS users have come to depend on."

4 of 157 comments (clear)

  1. may be missing the (data)points by yagu · · Score: 5, Insightful

    I don't know why this article is so harshly critical of MapReduce. They base their critique and criticism on the following five tenets, which they further elaborate in detail in the article:

    1. A giant step backward in the programming paradigm for large-scale data intensive applications
    2. A sub-optimal implementation, in that it uses brute force instead of indexing
    3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago
    4. Missing most of the features that are routinely included in current DBMS
    5. Incompatible with all of the tools DBMS users have come to depend on

    If you take the time to read the article you'll find they use axiomatic arguments with lemmas like: "schemas are good", and "Separation of the schema from the application is good, etc. First, they make the assumption that these points are relevant and germaine to MapReduce. But, they mostly aren't.

    Also taking the five tenets listed, here are my observations:

    1. A giant step backward in the programming paradigm for large-scale data intensive applications

      they don't offer any proof, merely their view... However, the fact that Google used this technique to re-generate their entire internet index leads me to believe that is this were indeed a giant step backward, we must have been pretty darned evolved to step "back" into such a backwards approach

    2. A sub-optimal implementation, in that it uses brute force instead of indexing

      Not sure why brute force is such a poor choice, especially given what this technique is used for. From wikipedia:

      MapReduce is useful in a wide range of applications, including: "distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, statistical machine translation..." Most significantly, when MapReduce was finished, it was used to completely regenerate Google's index of the World Wide Web, and replaced the old ad hoc programs that updated the index and ran the various analyses.
    3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago

      Again, not sure why something "old" represents something "bad". The most reliable rockets for getting our space satellites into orbit are the oldest ones.

      I would also argue their bold approach to applying these techniques in such a massively aggregated architecture is at least a little novel, and based on results of how Google has used it, effective.

    4. Missing most of the features that are routinely included in current DBMS

      They're mistakenly assuming this is for database programming

    5. Incompatible with all of the tools DBMS users have come to depend on

      See previous bullet

    Are these guys just trying to stake a reputation based on being critical of Google?

    1. Re:may be missing the (data)points by Anonymous Coward · · Score: 5, Funny

      You missed points 6 through 9:

      6. New things are scary.
      7. Google is on their lawn.
      8. Matlock is the best television show ever.

    2. Re:may be missing the (data)points by mishabear · · Score: 5, Interesting

      > I don't know why this article is so harshly critical of MapReduce.
      > Are these guys just trying to stake a reputation based on being critical of Google?

      Um... yes?

      The Database Column is being coy about being a corporate blog for Vertica, a high performance database database product, but in fact it is. Vertica is a commercial implementation of C-Store and was founded by Michael Stonebraker, the most prominent proponent of column based databases (get it? the database column). So yes, they have a very good reason to be hostile to Google.

      http://www.vertica.com/company/leadership
      http://en.wikipedia.org/wiki/C-Store
      http://en.wikipedia.org/wiki/Michael_Stonebraker
      http://www.databasecolumn.com/2007/09/contributors.html

  2. In related news: Screwdrivers suck because... by DragonWriter · · Score: 5, Funny

    1) They don't look like hammers,
    2) They don't work like hammers,
    3) You can already drive in a screw with a hammer,
    4) They aren't good at ripping out nails, and
    5) They aren't good at driving nails.

    Brought to you by The Hammer Column, a blog written by experts in the hammer industry, and launched by Hammertron, makers of a revolutionary new kind of hammer.