Slashdot Mirror


The Design Of The Google File System

Freddles writes "This is an interesting paper (PDF) describing the design approach to Google's file system. The design had to take account of requirements for huge file sizes, a highly responsive infrastructure and an assumption that hardware components will always fail."

6 of 210 comments (clear)

  1. Interesting... by petermdodge · · Score: 3, Insightful

    It's an interesting enough read, it certainly is interesting to see how one of the biggest-volume servers out there cope. Now, the question is, what can us little server guys do to implement the ideas therein to our server? What can we take from it?

    --


    Peter M. Dodge,
    Chief Executive Officer,
    LiquidFire Studios

    Platinum Linux - www.
  2. Re:Just to make it clear.. by Klaruz · · Score: 3, Insightful

    Could you cite your source please? In the first page of the paper linked:

    "It is widely deployed within Google for the generation and processing of data used by our service as well as research and development that requires large data sets."

  3. Re:Thoughtful... by xoboots · · Score: 4, Insightful

    There's a reason not every search engine is considered the same. Try a simple search for a popular item. I searched for "PHP" on the three sites you mentioned. The top returned results are as follows:

    Google:
    - top result: php.net
    - 2nd place was php.net/downloads

    AllTheWeb:
    - top result: Hands-On PHP Training - 4 days $1695 (also ranked #10 on Turbo10, but not ranked in the top 20 at Google) -- oops, that is a sponsored link, but in AllTheWeb's default view, it looks like a normal link. php.net is actually ranked #1, but it appears 4th in the list of available links.

    Turbo10:
    - will not provide ANY results without Javascript turned on (BOO!)
    - top result: GBF Masonry Cleaning Services..Stone Cleaning
    - php.net ranked 5

    Draw your own conclusions, but meta-search engines existed prior to Google yet even at its launch it excelled over them in terms of provision of relevant links. It appears that it still does. At least for a first pass :)

    I suspect that one of the reasons that Google can bring higher quality links to the forefront is that being #1, they have a wider and more generous revenue base and therefore don't have to be as generous to "paying patrons" *cough cough*.

    Another problem is that meta engines have to mix "high-quality" results (say from Google) with lower quality results (say from some dippy paid for advertising search engine).

  4. Re:they published it ... by Anonymous Coward · · Score: 2, Insightful

    The catch up Law.

    Basically it says that if you spend all your time playing catch up you never be first.

    If the other Search engines use the GoogleFS then you know they aren't the leader. Sort of like if kernal.org was running windows 2003 or if www.msn.com was running on linux.

    Now if they go and create a FS so they can be the same as google then they are just catching up. Once they catch up to Google, Google will be somewhere else.

    The other thing is they're are lots of Clustered file systems around so it's not like they have the only one. They've just optimsed one for their needs.

    Basically if the other companies copy the idea it would take them at least a year to get it working by then the Google FS will have more features or they may have another bootle neck eg Google NUMA or the like.

  5. Re:they published it ... by hankaholic · · Score: 4, Insightful

    I wonder what they believe will protect their business from poaching of these ideas?

    Perhaps the fact that it's taken many very smart people a good amount of time to implement and tune the original design, even after having come up with the basic layout?

    Go take a look at the ReiserFS Future Vision page -- you'll see some more interesting discussion of filesystem design, and overall direction. There are a few solid developers working full-time on the concepts discussed in the Reiser docs, and they still have enough work to keep them busy for years to come.

    Google releasing information regarding the structure of their systems is a bit like John Carmack discussing the structure of his graphics engines: there's a hell of a distance between a conceptual description and a fine-tuned, tested, working implementation.

    Given Google's history, I'd also imagine that they're on the lookout for up-and-coming young researchers. As such, if some grad student takes their work and extends it, they can certainly benefit.

    --
    Somebody get that guy an ambulance!
  6. What a waste.... by abramsh · · Score: 2, Insightful

    Should have just bought one of these: SGI SAN 3000 It would be easier and cheaper to manage, scales better, and you wouldn't have to spend the money to create and maintain the file system.