Google Switching To EXT4 Filesystem

← Back to Stories (view on slashdot.org)

Google Switching To EXT4 Filesystem

Posted by timothy on Thursday January 14, 2010 @08:50AM from the make-money-with-open-source dept.

An anonymous reader writes "Google is in the process of upgrading their existing EXT2 filesystem to the new and improved EXT4 filesystem. Google has benchmarked three different filesystems — XFS, EXT4 and JFS. In their benchmarking, EXT4 and XFS performed equally well. However, in view of the easier upgrade path from EXT2 to EXT4, Google has decided to go ahead with EXT4."

9 of 348 comments (clear)

Time for a backup? by Itninja · 2010-01-14 08:52 · Score: 5, Informative

I guess now is as good as any to go through my Gmail and Google Docs and make local backups. I'm sure my info is safe, but I have been through these types of 'upgrades' at work before and every once in a while....well, let's just say backups are never a bad idea.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
1. Re:Time for a backup? by paradigm82 · 2010-01-14 09:16 · Score: 5, Funny
  
  It's probably nothing, probably. But I'm getting a small discrepancy in the file sizes...no, no, it's well within acceptable limits. Continue to stage 2.
2. Re:Time for a backup? by tool462 · 2010-01-14 09:17 · Score: 5, Funny
  
  I usually let the bit-gods decide what data I have that is important enough to save. Over the years the bit-gods have taught me that:
  Music files: not important, Styx crossed the Styx to /dev/null in 2002
  Essay written for sophomore year high school english: Important, I assume to haunt me in some future political race.
  Porn collection: Like the subject matter within, it swells impressively, explodes, then enters a refractory period until it's ready to build up again.
  C++ program that graphs the Mandelbrot set: Important. I like feeling like an explorer navigating the cardioid's canyons.
  Photos of my children: Not important. If I need more baby photos, I can just have more babies.
3. Re:Time for a backup? by Anonymous Coward · 2010-01-14 09:39 · Score: 5, Funny
  
  Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.
  The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.
  And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.
  My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.
Digitzor link uesless by autocracy · 2010-01-14 08:58 · Score: 5, Informative

I managed to ease a pageview out of it. That said, the /. summary says all they say, and you're all better served by the source they point to, which is what SHOULD have been in the article summary instead of the Digitzor site.
See http://lists.openwall.net/linux-ext4/2010/01/04/8

--
SIG: HUP
Re:Btrfs? by Paradigm_Complex · 2010-01-14 09:11 · Score: 5, Informative

From kernel.org's BTRFS page:

Btrfs is under heavy development, and is not suitable for any uses other than benchmarking and review. The Btrfs disk format is not yet finalized, but it will only be changed if a critical bug is found and no workarounds are possible.
It's ready for benchmarking, it's just not ready for widespread use yet. If Google was looking for a filesystem to make a switch to in the near future, BTRFS simply isn't an option quite yet.

It's really easy at this point to move from EXT2 to EXT4 (I believe you can simply remount the partition as the new filesystem, maybe change a flag or two, and away you go). It's basically free performance. If Google is convinced it's stable, there isn't much reason not to do this. It could act as an interim filesystem until something significantly better - such as BTRFS - gets to the point where it's dependable. The fact BTRFS was not mentioned here doesn't mean it's completely ruled out.

--
"A witty saying proves nothing." - Voltaire
Re:No ReiserFS? by jspenguin1 · 2010-01-14 09:35 · Score: 5, Funny

They need to change the name... How about
Object-oriented
Journalled
File
System?
Re:Has Ted Cooked the Benchmarks Again? by tytso · 2010-01-14 12:11 · Score: 5, Informative

So I'm not sure what you're talking about. If you're talking about delayed allocation, XFS has it too, and the same buggy applications that don't use fsync() will also lose information after a buggy proprietary Nvidia video driver crashes your machine, regardless of whether you are using XFS or ext4.
If you are talking about the change to _ext3_ to use data=writeback, that was a change that Linus made, not me, and ext4 has always defaulted to data=ordered. Linus thought that since the vast majority of Linux machines are single-user desktop machines, the performance hit of data=ordered, which is designed to prevent exposure of uninitialized data blocks after a crash wasn't worth it. I and other file system engineers disagreed, but Linus's kernel, Linus's rules. I pushed a patch to ext3 which makes the default a config option, and as far as I know the enterprise distro's plan to use this config option to keep the defaults the same as before for ext3.
Since it was my choice, I actually changed the defaults for ext4 to use barriers=1. which Andrew Morton vetoed for ext3 because again, he didn't think it was worth the performance hit. But with ext4, the benefits of delayed allocation and extents are so vast that it completely dominated the performance hit of turning on write barriers. That is what most of the performance benefits for ext4 come from, and it is very much a huge step forward compared to ext3.
So with respect, you don't know what you are talking about.
-- Ted
Re:Google doesn't need journaling? by Jeff- · 2010-01-14 19:31 · Score: 5, Informative

There's a lot of misinformation in this thread about softupdates. I only have so much time to reply so I'll hit a few key points. I'm the author of journaling extensions to softupdates so I have some experience in this area.
This notion that softupdates was so complex and so inhibited new features in ffs is bogus. I've seen it repeated a few times. There simply was not much pressure for these features and the filesystem metadata did not support it until ufs2. The total amount of code dedicated to extended attributes in softupdates can't be more than 100 lines. ffs sees fewer features because we have fewer developers period.
Furthermore, softupdates is just a different approach. It is no more complex than journaling. When I review a sophisticated journaling implementation such as xfs I see more lines of code dedicated to journaling and transaction management than softupdates requires for dependency tracking. I have worked on a number of production filesystems and while softdep is definitely not trivial, neither were any of the others unless you compare to synchronous ufs. I think a lot of people who are familiar with COW and Journaling are looking at this unfairly because they already know another system and forget how long it took to become comfortable with it.
In cpu benchmarks softdep costs more than async ffs, this is true. However, rollbacks are actually quite infrequent because our buffercache attempts to write buffers without dependencies first. Generally there are enough of those which satisfy dependencies on other buffers that you can keep the pipeline busy. Looking at the code size and depth in any modern filesystem it's clear that a lot of cpu is involved. Are journal blocks not consuming memory? Is the transaction tracking free? Most dependency structures are quite small compared to generating a copy of a metadata block for a jouranl write.
NetBSD abandoned softdep for something much simpler because they didn't have the resources to fix the bugs in it and they didn't incorporate fixes from FreeBSD. Their journaling implementation is similar to our gjournal which is mostly filesystem agnostic and does full block logging in a very simple fashion.
The journaled filesystem project was started simply to get rid of fsck. I think this hybrid solution is very promising. It gives us a place to issue barriers which can affect arbitrary numbers of filesystem operations. The journal write overhead is much lower than with traditional journals.
And regarding benchmarks; FreeBSD doesn't really have a comparably developed journaling filesystem to benchmark softdep against. I think it's unreasonable to compare linux with ext4 to FreeBSD with ffs+softdep for purposes of evaluating the filesystem design. Too many other factors come into play.
You can read more about softdep journaling at http://jeffr_tech.livejournal.com/
Thanks,
Jeff