Slashdot Mirror


The Design Of The Google File System

Freddles writes "This is an interesting paper (PDF) describing the design approach to Google's file system. The design had to take account of requirements for huge file sizes, a highly responsive infrastructure and an assumption that hardware components will always fail."

19 of 210 comments (clear)

  1. In case you don't like PDF by Brahmastra · · Score: 4, Informative

    Here's the html link

    1. Re:In case you don't like PDF by Short+Circuit · · Score: 5, Funny

      A ramdisk would make for a great swap partition. :)

  2. Thoughtful... by Anonymous Coward · · Score: 5, Funny

    It was thoughtful of the poster to link to google.com for those that have never heard of it.

    1. Re:Thoughtful... by Queuetue · · Score: 5, Funny

      Absolutely - I was about to go look google up on teoma and askjeeves...

    2. Re:Thoughtful... by Anonymous Coward · · Score: 5, Funny

      Last week I had a co-worker ask how to spell it. He is MS cert'd for Win2k Pro. Don't mod this funny, it's sad.

    3. Re:Thoughtful... by xoboots · · Score: 4, Insightful

      There's a reason not every search engine is considered the same. Try a simple search for a popular item. I searched for "PHP" on the three sites you mentioned. The top returned results are as follows:

      Google:
      - top result: php.net
      - 2nd place was php.net/downloads

      AllTheWeb:
      - top result: Hands-On PHP Training - 4 days $1695 (also ranked #10 on Turbo10, but not ranked in the top 20 at Google) -- oops, that is a sponsored link, but in AllTheWeb's default view, it looks like a normal link. php.net is actually ranked #1, but it appears 4th in the list of available links.

      Turbo10:
      - will not provide ANY results without Javascript turned on (BOO!)
      - top result: GBF Masonry Cleaning Services..Stone Cleaning
      - php.net ranked 5

      Draw your own conclusions, but meta-search engines existed prior to Google yet even at its launch it excelled over them in terms of provision of relevant links. It appears that it still does. At least for a first pass :)

      I suspect that one of the reasons that Google can bring higher quality links to the forefront is that being #1, they have a wider and more generous revenue base and therefore don't have to be as generous to "paying patrons" *cough cough*.

      Another problem is that meta engines have to mix "high-quality" results (say from Google) with lower quality results (say from some dippy paid for advertising search engine).

  3. Story summary by slash-tard · · Score: 4, Funny

    Google uses MS access as a backend to store all of its cache files. It is redundant by having a batch file setup with the windows "at" command to "xcopy" the data to another backup server.

  4. PDF mirror by Tyler+Eaves · · Score: 4, Informative

    PDF mirror on my server /Feels sorry for the Rochester cs server

    --
    TODO: Something witty here...
  5. Just to make it clear.. by Doodhwala · · Score: 4, Informative


    Okay, so I read this paper as a part of the SOSP reading group here at school. Just want to make it clear that this is not the file system used by the front end that we all see. It is used by internal dev groups as well as the web spiders that they employ. Their unique usage has definitely led to a number of interesting choices (such as the atomic appends) for the file system design. Read the paper for more details :-)

  6. Hmmm. by Pig+Hogger · · Score: 4, Funny

    I'd like to see a beow...
    Never mind.

  7. Only a file system? by jrrl · · Score: 5, Interesting
    Back in the early days at Lycos, Danner Stodolsky, now at Akamai used so many weird little tricks to make things faster that we used to joke that we'd end up with a custom operating system. The supposed name? LycOS.

    Luckily the world was saved from this possibility.

    -John (now, one of those "why, back in my day..." story telling guys... sigh.)

    --
    Self Serving Sig: Hosting Comparison
  8. Is it open source? by The+Ancients · · Score: 4, Funny

    I need something for my p...err, book collection.

  9. Re:You mean FAT don't cut it no more? by Wumpus · · Score: 4, Funny

    Surely you mean "WEEEEE~1.EEE".

  10. Re:great. now, deal with the spam issue by winkydink · · Score: 4, Funny
    how many times have you searched for something on google, only to find that the search engine spammers have taken over almost every top 10 result?

    Ummm... not very many. Then again, I try not to search on "teen panties" very often. :)

    That reminds me of the winter I spent in Chicago. I needed some galoshes to protect my shoes and keep my feet dry. Back in New England, we called them "rubbers" (I am not making this up). Needless to say, a google search on "buy rubbers" did not yield the intended results.

    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

  11. Fabulous Insights by dolo666 · · Score: 4, Informative

    I really enjoyed that read about the file system Google uses. The fact that they usually append to their files, is of special note. By appending data you only need to know a simple pointer address. Seems quick enough. Add a bunch of threaded concurrent writes and you could get into trouble on other systems... The "atomic append" seems interesting because of the use of multiple machines to append simultaneously (hazard free).

    64meg chunk size is pretty huge, but I'm guessing that's blocked out based on continual threads of data, not typical files.

    At first glance, this file system seems fairly wasteful. But hey, Google likely require speed and reliability over cost. Right?

    This reminds me of the discussions about not-so-far-off database filesystems coming to an OS near you.

  12. And starting with Linux 2.7... by JessLeah · · Score: 4, Funny

    ...the Linux kernel will have googlefs support. It will be marked (EXPERIMENTAL), though, and will only run on 10,000-node Babelfish clusters...

  13. they published it ... by trick-knee · · Score: 5, Interesting

    ... which may not have happened from just any company of google's prominence. I mean, they have highly successful business and technical infrastructure models and they didn't HAVE to share it with anyone.

    I wonder what they believe will protect their business from poaching of these ideas?

    1. Re:they published it ... by hankaholic · · Score: 4, Insightful

      I wonder what they believe will protect their business from poaching of these ideas?

      Perhaps the fact that it's taken many very smart people a good amount of time to implement and tune the original design, even after having come up with the basic layout?

      Go take a look at the ReiserFS Future Vision page -- you'll see some more interesting discussion of filesystem design, and overall direction. There are a few solid developers working full-time on the concepts discussed in the Reiser docs, and they still have enough work to keep them busy for years to come.

      Google releasing information regarding the structure of their systems is a bit like John Carmack discussing the structure of his graphics engines: there's a hell of a distance between a conceptual description and a fine-tuned, tested, working implementation.

      Given Google's history, I'd also imagine that they're on the lookout for up-and-coming young researchers. As such, if some grad student takes their work and extends it, they can certainly benefit.

      --
      Somebody get that guy an ambulance!
  14. Google cache by Skreech · · Score: 5, Funny

    In case Google gets slashdotted, here is the Google cache for Google.