The Design Of The Google File System
Freddles writes "This is an interesting paper (PDF) describing the design approach to Google's file system. The design had to take account of requirements for huge file sizes, a highly responsive infrastructure and an assumption that hardware components will always fail."
Here's the html link
It was thoughtful of the poster to link to google.com for those that have never heard of it.
Google uses MS access as a backend to store all of its cache files. It is redundant by having a batch file setup with the windows "at" command to "xcopy" the data to another backup server.
PDF mirror on my server /Feels sorry for the Rochester cs server
TODO: Something witty here...
Okay, so I read this paper as a part of the SOSP reading group here at school. Just want to make it clear that this is not the file system used by the front end that we all see. It is used by internal dev groups as well as the web spiders that they employ. Their unique usage has definitely led to a number of interesting choices (such as the atomic appends) for the file system design. Read the paper for more details
I'd like to see a beow...
Never mind.
Luckily the world was saved from this possibility.
-John (now, one of those "why, back in my day..." story telling guys... sigh.)
Self Serving Sig: Hosting Comparison
I need something for my p...err, book collection.
The Mothership
Surely you mean "WEEEEE~1.EEE".
Ummm... not very many. Then again, I try not to search on "teen panties" very often. :)
That reminds me of the winter I spent in Chicago. I needed some galoshes to protect my shoes and keep my feet dry. Back in New England, we called them "rubbers" (I am not making this up). Needless to say, a google search on "buy rubbers" did not yield the intended results.
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
I really enjoyed that read about the file system Google uses. The fact that they usually append to their files, is of special note. By appending data you only need to know a simple pointer address. Seems quick enough. Add a bunch of threaded concurrent writes and you could get into trouble on other systems... The "atomic append" seems interesting because of the use of multiple machines to append simultaneously (hazard free).
64meg chunk size is pretty huge, but I'm guessing that's blocked out based on continual threads of data, not typical files.
At first glance, this file system seems fairly wasteful. But hey, Google likely require speed and reliability over cost. Right?
This reminds me of the discussions about not-so-far-off database filesystems coming to an OS near you.
...the Linux kernel will have googlefs support. It will be marked (EXPERIMENTAL), though, and will only run on 10,000-node Babelfish clusters...
Honey, I shrunk the Cygwin
... which may not have happened from just any company of google's prominence. I mean, they have highly successful business and technical infrastructure models and they didn't HAVE to share it with anyone.
I wonder what they believe will protect their business from poaching of these ideas?
In case Google gets slashdotted, here is the Google cache for Google.