Google Switching To EXT4 Filesystem

← Back to Stories (view on slashdot.org)

Google Switching To EXT4 Filesystem

Posted by timothy on Thursday January 14, 2010 @08:50AM from the make-money-with-open-source dept.

An anonymous reader writes "Google is in the process of upgrading their existing EXT2 filesystem to the new and improved EXT4 filesystem. Google has benchmarked three different filesystems — XFS, EXT4 and JFS. In their benchmarking, EXT4 and XFS performed equally well. However, in view of the easier upgrade path from EXT2 to EXT4, Google has decided to go ahead with EXT4."

9 of 348 comments (clear)

Min score:

Reason:

Sort:

No ReiserFS? by CRCulver · 2010-01-14 09:00 · Score: 3, Interesting

It's interesting that ReiserFS wasn't even an option here. I myself even ended up using Ext4 when I set up a new box not too long ago. It's a real shame that just because the creator of the filesystem committed a crime, people are drawn to treat the technology itself are somehow dishonored.
1. Re:No ReiserFS? by mqduck · 2010-01-14 11:51 · Score: 3, Interesting
  
  So it's not because the creator of the filesystem committed a crime, it's because the product has an unsavoury name
  Actually, it's more likely because the creator and main developer of the filesystem is suddenly gone. As I understand it, he wasn't a very friendly guy (surprise!) and drove others away from the project.
  
  --
  Property is theft.
Google doesn't need journaling? by Paradigm_Complex · 2010-01-14 09:00 · Score: 3, Interesting

The main advantage of EXT3 over EXT2 is that, with journaling, if you ever need to fsck the data, it goes a LOT quicker. It's interesting to note that Google never felt it needed that functionality.

Additionally, I was under the impression that Google used massive numbers of commodity consumer-grade harddrives, as opposed to high-grade stuff which I presume is less likely to err. Couple this fact with the massive amount of data Google is working with and there has got to be a lot of filesystem errors, no?

Can anyone else with experience with big database stuff hint as to why Google would not need to fsck their data (often enough for EXT3 to be worthwhile)? Is it cheaper just to overwrite the data from some backup elsewhere at this scale? How do they know the backup is clean without fscking that?

--
"A witty saying proves nothing." - Voltaire
1. Re:Google doesn't need journaling? by tytso · 2010-01-14 11:55 · Score: 4, Interesting
  
  So there's a major problem with Soft Updates, which is that you can't be sure that data has hit the disk platter and is on stable store unless you issue a barrier operation, which is very slow. What Soft Updates apparently does is assume that once the data is sent to the disk, it is safely on the disk. But that's not a true assumption! The disk drive, especially modern ones with large caches, can reorder writes which are sent to the disk, sometimes (with the right pathological workloads) for minutes at a time. You won't notice this problem if you just crash the kernel, or even if you hit the reset button. But if you pull the plug or otherwise cause the system to drop power, data in the disk's write cache won't necessarily be written to disk. The problem that we saw with journal checksums and ext4 only showed up on a power drop, because there was a missing barrier operation, so this is not a hypothetical consideration.
  In addition, if you have a very heavy write workload, the Soft Updates code will need to burn a fairly large amount of memory tracking the dependencies and burn quite a bit of CPU figuring out which dependencies need to be rolled back. I'm a bit suspicious of how well they perform and how much CPU they steal from applications --- which granted, may not show up in benchmarks which are disk bound. But if the applications or the large number of jobs running on a shared machine are trying to use lots of CPU as well as disk bandwidth, this could very much be an issue.
  BTW, while I was doing some quick research for this reply. it seems that NetBSD is about to drop Soft Updates in favor of a physical block journaling technology (WAPBL), according to Wikipedia. They didn't get a reference to this, nor did they say why NetBSD was planning on dropping Soft Updates, but there is a description of the replacement technology here: http://www.wasabisystems.com/technology/wjfs. But if Soft Updates is so great, why is NetBSD replacing it and why did Free BSD add file system journaling alternative to UFS?
It's Not Hans by TheNinjaroach · 2010-01-14 09:06 · Score: 4, Interesting

I too have abandoned using ReiserFS but it's not about the horrible crime Hans committed. It's about the fact I don't think the company that he owned (who developed ReiserFS) has a great future, so I foresee maintenance problems with that filesystem. Sure, somebody else can continue their work but I'm not going to hold my breath.

--
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
Re:Not A Nerd? by MBGMorden · 2010-01-14 09:11 · Score: 3, Interesting

I too found it interesting, because it basically alleviates any need for me to worry about "upgrading" to ext4. My current Linux systemse use an ext3 /boot partition and everything else xfs. Given some of the press ext4 has gotten lately, I just trust xfs more, and knowing that I'm not really giving up any performance is a huge plus.
Truthfully though, where the heck are the meta-data based filesystems that we were promised? I've love to be able to, on a filesystem level, instantly pull up a folder view of all videos - or all images. Or all images of my dog. Or all images outdoors. Or all images of my dog outdoors.
Basically, just the ability to organize via an arbitrary number of categorized tags.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
XFS performance highly variable by bzipitidoo · 2010-01-14 09:14 · Score: 3, Interesting

I've used XFS on a RAID1 setup with SATA drives, and found the performance of the delete operation extremely dependent on how the partition was formatted.
I saw times of up to 5 minutes to delete a Linux kernel source tree on a partition that was formatted XFS with the defaults. Have to use something like sunit=64, swidth=64, and even then it takes 5 seconds to rm -rf /usr/src/linux. I've heard that SAS drives wouldn't exhibit this slowness. Under Reiserfs on the same system, the delete took 1 second. Anyway, XFS is notorious for slow delete operations.

--
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
Ubuntu 9.10? by GF678 · 2010-01-14 09:36 · Score: 4, Interesting

Gee, I hope they're not using Ubuntu 9.10 by any chance: http://www.ubuntu.com/getubuntu/releasenotes/910

There have been some reports of data corruption with fresh (not upgraded) ext4 file systems using the Ubuntu 9.10 kernel when writing to large files (over 512MB). The issue is under investigation, and if confirmed will be resolved in a post-release update. Users who routinely manipulate large files may want to consider using ext3 file systems until this issue is resolved. (453579)
The damn bug is STILL not fixed apparently. Some people get the corruption, and some don't. Scares me enough to not even try using ext4 just yet, and I'm still surprised Canonical was stupid enough to have ext4 as the default filesystem in Karmic.
Then again, perhaps Google knows what they're doing.
Re:Not A Nerd? by smash · 2010-01-14 12:18 · Score: 3, Interesting

You can use ZFS. Just run FreeBSD or opensolaris. The amount of software that runs on Linux but not FreeBSD (particularly if you're talking about open-source) is exceedingly minimal.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.