Slashdot Mirror


The Design Of The Google File System

Freddles writes "This is an interesting paper (PDF) describing the design approach to Google's file system. The design had to take account of requirements for huge file sizes, a highly responsive infrastructure and an assumption that hardware components will always fail."

9 of 210 comments (clear)

  1. In case you don't like PDF by Brahmastra · · Score: 4, Informative

    Here's the html link

  2. PDF mirror by Tyler+Eaves · · Score: 4, Informative

    PDF mirror on my server /Feels sorry for the Rochester cs server

    --
    TODO: Something witty here...
  3. Just to make it clear.. by Doodhwala · · Score: 4, Informative


    Okay, so I read this paper as a part of the SOSP reading group here at school. Just want to make it clear that this is not the file system used by the front end that we all see. It is used by internal dev groups as well as the web spiders that they employ. Their unique usage has definitely led to a number of interesting choices (such as the atomic appends) for the file system design. Read the paper for more details :-)

    1. Re:Just to make it clear.. by Doodhwala · · Score: 3, Informative

      And if you read that statement, it does not mention the front-end. Generation and processing all takes place offline as most of the query results are only updated once a month (the Google-dance). And this question was asked of Howard Gobioff (one of the co-authors) at a presentation on the Google File System (GFS) at Carnegie Mellon.

  4. html version by kaan · · Score: 3, Informative

    thanks to, ehh, Google, here's an html version of the article

    I didn't read the whole article (kinda lengthy) but it seems pretty informative. I found their assumptions interesting, as they reveal some of the essence of what makes Google such a great search tool. Here are a few from the article:

    - The system is built from many inexpensive commodity components that often fail. It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis.

    - High sustained bandwidth is more imprtant that low latency. Most of our target applications place a premium onprocessing data in bulk at a high rate, while few have stringent response time requirements for an individual read or write.

    - The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. Successive operations from the same client often read through a contiguous region of a file.

  5. Fabulous Insights by dolo666 · · Score: 4, Informative

    I really enjoyed that read about the file system Google uses. The fact that they usually append to their files, is of special note. By appending data you only need to know a simple pointer address. Seems quick enough. Add a bunch of threaded concurrent writes and you could get into trouble on other systems... The "atomic append" seems interesting because of the use of multiple machines to append simultaneously (hazard free).

    64meg chunk size is pretty huge, but I'm guessing that's blocked out based on continual threads of data, not typical files.

    At first glance, this file system seems fairly wasteful. But hey, Google likely require speed and reliability over cost. Right?

    This reminds me of the discussions about not-so-far-off database filesystems coming to an OS near you.

  6. Re:Word processor? by Saunalainen · · Score: 2, Informative

    The PDF file claims to have been made by dvips, so it was written in Latex. It was then converted to PDF using Distiller.

  7. Prevayler anyone? by 12357bd · · Score: 2, Informative

    The in-memory master behaviour described in the paper ressembles a lot the Prevayler software.

    --
    What's in a sig?
  8. LaTex is not a word processor by maxmg · · Score: 3, Informative

    It's more of a "text compiler" where you concentrate on writing the content and leave all of the formatting to a template that is responsible for transofmring the content into (normally postscript) output. Anybody who has worked with LaTex and then moved to Word, only to have that stupid piece of sh*t bunch all images in a document together, on top of each other, on the first or last page of their document will appreciate the LaTex workflow. And LaTex absolutely rocks when it comes to formulas.

    That being said, LaTex comes with a siginificant learning curve, and due to its nature misses some of the features that are important in a business environment (most notably changes tracking). There are some pseudo-wysiwig frontends for LaTex, such as Lyx, but they are firmly targeted at an academic audience. Most scientific papers require submissions in .ps format, processed with a speified LaTex templates (at tleast they did when I was at Uni).

    --
    I asked for a refund - and got my monkey back.