Google Switching To EXT4 Filesystem
An anonymous reader writes "Google is in the process of upgrading their existing EXT2 filesystem to the new and improved EXT4 filesystem. Google has benchmarked three different filesystems — XFS, EXT4 and JFS. In their benchmarking, EXT4 and XFS performed equally well. However, in view of the easier upgrade path from EXT2 to EXT4, Google has decided to go ahead with EXT4."
I guess now is as good as any to go through my Gmail and Google Docs and make local backups. I'm sure my info is safe, but I have been through these types of 'upgrades' at work before and every once in a while....well, let's just say backups are never a bad idea.
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Looks like Digitizor already melted.
Eats, shoots and leaves. Read it.
News for nerds. Stuff that matters.
Not that I RTFA or anything, but I find it interesting that XFS and EXT4 both appear to be equally impressive with benchmarks, and it's implied they are both better than JFS. You must not be a nerd.
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
I managed to ease a pageview out of it. That said, the /. summary says all they say, and you're all better served by the source they point to, which is what SHOULD have been in the article summary instead of the Digitzor site.
See http://lists.openwall.net/linux-ext4/2010/01/04/8
SIG: HUP
They have Ted T'so of Linux filesystem fame working for them now.
"Oppression and harassment is a small price to pay to live in the land of the free." -- Montgomery Burns.
I guess they didn't consider btrfs ready enough for benchmarking yet.
It's interesting that ReiserFS wasn't even an option here. I myself even ended up using Ext4 when I set up a new box not too long ago. It's a real shame that just because the creator of the filesystem committed a crime, people are drawn to treat the technology itself are somehow dishonored.
The main advantage of EXT3 over EXT2 is that, with journaling, if you ever need to fsck the data, it goes a LOT quicker. It's interesting to note that Google never felt it needed that functionality.
Additionally, I was under the impression that Google used massive numbers of commodity consumer-grade harddrives, as opposed to high-grade stuff which I presume is less likely to err. Couple this fact with the massive amount of data Google is working with and there has got to be a lot of filesystem errors, no?
Can anyone else with experience with big database stuff hint as to why Google would not need to fsck their data (often enough for EXT3 to be worthwhile)? Is it cheaper just to overwrite the data from some backup elsewhere at this scale? How do they know the backup is clean without fscking that?
"A witty saying proves nothing." - Voltaire
Did they fix that nasty "if you have files > 512MB kiss them goodbye" bug ?
I want to delete my account but Slashdot doesn't allow it.
From TFA:
In their benchmarking, EXT4 and XFS performed, as impressively as each other.
WTF kind of retarded sentence is that?! Did Rob Smith help you write that article?!
In their benchmarking of EXT4 and XFS, EACH performed as impressively as THE OTHER.
We are still using ext2 on servers. Now I have an argument; if Google is still using ext2 maybe we aren't so foolish. We might update some day but it is not yet a priority. With UPS and proper fail over and backup procedure in place, I can't remember when a jounaling file system would have helped us in any way. They seem great for desktops/laptops although.
Everything I write is lies, read between the lines.
I too have abandoned using ReiserFS but it's not about the horrible crime Hans committed. It's about the fact I don't think the company that he owned (who developed ReiserFS) has a great future, so I foresee maintenance problems with that filesystem. Sure, somebody else can continue their work but I'm not going to hold my breath.
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
I've used XFS on a RAID1 setup with SATA drives, and found the performance of the delete operation extremely dependent on how the partition was formatted.
I saw times of up to 5 minutes to delete a Linux kernel source tree on a partition that was formatted XFS with the defaults. Have to use something like sunit=64, swidth=64, and even then it takes 5 seconds to rm -rf /usr/src/linux. I've heard that SAS drives wouldn't exhibit this slowness. Under Reiserfs on the same system, the delete took 1 second. Anyway, XFS is notorious for slow delete operations.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
I thought google had their own file system named the google files system.
http://labs.google.com/papers/gfs.html
Might this prompt someone at Google to make an installable file system driver for Windows for EXT4? Right now, there is none, because of differing inode sizes and some extra features over EXT2 that EXT4 demands I think.
i di'dnt read the funky article because it's been slashdoted, but i'd like to see properly the benchmarks
Gee, I hope they're not using Ubuntu 9.10 by any chance: http://www.ubuntu.com/getubuntu/releasenotes/910
The damn bug is STILL not fixed apparently. Some people get the corruption, and some don't. Scares me enough to not even try using ext4 just yet, and I'm still surprised Canonical was stupid enough to have ext4 as the default filesystem in Karmic.
Then again, perhaps Google knows what they're doing.
When does black become white?
#CCCCCC or #888888
Is there overlap with Flamebait?
When does an otherwise 'troll' moderation-worthy comment lose out on status that could validate 19 responses, with 50% scoring +2?
Sometimes a troll is a troll, but sometimes its just a shadow.
"In their benchmarking, EXT4 and XFS performed, as impressively as each other."
Welcome to 2001, subby. Glad you could make it this decade.
I completely understand them not jumping to XFS, though. I'd never want to convert exabytes of data from one FS to another.
what about all the people who don't even bother to log in to post as AC?
Is this why Google was down for about 30 minutes today? Did anyone else even experience this or was it a local issue?
Or, you could stop being lazy and go tweak your preferences, thereby saving the rest of us from your whining.
Linux, you magnificent bastard, I read the fucking manual!
When you run data centres around the world that are collectively the most powerful supercomputer known to man, you too can get a front page story on ./ announcing your upgrade.
Until then, STFU.
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
The data path from program to disk is loooong. On a system with heavy CPU load, benchmarks on a well-tuned XFS system can fall to the same level as ext2 with defaults. Even multi-core doesn't help XFS under load; running Folding@Home at nice +19 still sucker-punched it.
JFS? It fails to scale on disk-saturated systems. However, it does have some optimizations specific to database workloads. Populating a sparse file ran fastest on my system, where XFS was a total fail.
ext3 under heavy CPU load showed degradation that appeared in the benchmarks, but was noticeable on the desktop only if I was watching for it. And ext4 (formatted, not converted from ext2/3) under load is faster than ext3 without load, when using "elevator=noop" at boot.
N.B.: The above benchmarks on my system all used external journals, except ext2 natch.
You can configure an higher threshold; 1 should be enough to filter most ACs.
Dilbert RSS feed
I assume you mean Increase the signal-to-noise ratio. Did you mean reduce the noise floor?
Here ya go.
Quack, quack.
(See first post)
SIG FAULT: Post index out of bounds.
BREAKING NEWS:
Google switches to new softer 2-ply toilet paper to reduce employee chafing.
BeauHD. Worst editor since kdawson.
So I'm not sure what you're talking about. If you're talking about delayed allocation, XFS has it too, and the same buggy applications that don't use fsync() will also lose information after a buggy proprietary Nvidia video driver crashes your machine, regardless of whether you are using XFS or ext4.
If you are talking about the change to _ext3_ to use data=writeback, that was a change that Linus made, not me, and ext4 has always defaulted to data=ordered. Linus thought that since the vast majority of Linux machines are single-user desktop machines, the performance hit of data=ordered, which is designed to prevent exposure of uninitialized data blocks after a crash wasn't worth it. I and other file system engineers disagreed, but Linus's kernel, Linus's rules. I pushed a patch to ext3 which makes the default a config option, and as far as I know the enterprise distro's plan to use this config option to keep the defaults the same as before for ext3.
Since it was my choice, I actually changed the defaults for ext4 to use barriers=1. which Andrew Morton vetoed for ext3 because again, he didn't think it was worth the performance hit. But with ext4, the benefits of delayed allocation and extents are so vast that it completely dominated the performance hit of turning on write barriers. That is what most of the performance benefits for ext4 come from, and it is very much a huge step forward compared to ext3.
So with respect, you don't know what you are talking about.
-- Ted
I've seen huge performance leaps for large files and directories after reinstalling my system on an ext4 partition. Ext3 was very slow to list directories containing large numbers of files, and deleting very large files took tens of seconds, during which the filesystem was hung. I couldn't remove large files while recording TV, otherwise the recording would hang and skip several seconds. No longer the case now I'm on ext4.
What's the fun in that, how would you know if somebody flames you? Half the time I get flamed, the initiating post ends up modded to +5
Apocalypse Cancelled, Sorry, No Ticket Refunds
Hello moderators?
English is not my mother-tongue, sorry if I occasionally make a mistake. I guess wasn't paying attention.
Let me add to the original discussion:
especially not some big corporation
New things are always on the horizon
Great post. Thank you for your insight!
I live in a zone where power failures are very common. While I was using EXT3 I lost data for several times due to power failures, and there was even a time a disk got corrupted. After I switched to JFS the data lost is minimal and I never had a corrupted disk. Another think I enjoy in JFS is that its really quick to fsck a disc after power failure. So is it safe to switch to EXT4 ?
Google has their own proprietary file system called gfs (and now gfs2), who came up with this rubbish?
They have special file system because of their design demands and the inherent flaws
in most file systems when you cluster vast amounts of computers together.
What does the writer of this post think he will accomplish by sending out this garbage is what I want to know!
Stop blaming the applications for a filesystem problem Ted. The excuse doesn't wash no matter how many times you use it, and no, XFS does not have it.
Stop blaming the applications for a filesystem problem Ted. The excuse doesn't wash no matter how many times you use it, and no, XFS does not have it.
http://en.wikipedia.org/wiki/XFS#Delayed_allocation
Any other questions? At the very least the applications are non-portable in the sense that they were depending on behavior not guaranteed by POSIX. XFS, btrfs, ZFS, and many if not most modern file systems do delayed allocation. It's one of the basic file system tricks to improve performance.
The code written in those applications has been around for years, so stop trying to blame that for a problem that only materialised recently (although the 'problem' shouldn't be new to anyone really). A filesystem blaming userspace for certain things happening and hiding behind POSIX for well known behaviour and code that should be well tested first is one of the most retarded, and worrying, things I have ever heard. Userspace is not going to be 'fixed' in this regard for reasons which should be damn obvious. No, we're not all going to switch to sqlite. Yes, small reads and writes are part and parcel of a great many applications, and will be for years to come. Granted, XFS has historically had more of a problem in this area than other filesystems but at least they have a well tested implementation that is years ahead of ext4 - not that the approach isn't more 'risque'.
It just wiffs of some backside covering, that's all. In any case 'XFS does it too!' isn't much of a defence, especially given the use cases of the ext* line of filesystems and that it is expected to be a ext2/3 replacement.
So before I tried agitating for programmers to fix their buggy applications, I had already implemented both the heuristic that XFS uses (if you truncate a file descriptor, add an implicit fsync on the close of that fd), and in addition I had implemented another heuristic (if you rename on top of an existing file, fsync the source file of the rename). This was to work around buggy applications, and as you can see, ext4 does even more than XFS does.
At the end of the day, though, the heuristic can sometimes get things wrong, and sometimes the heuristic will be too aggressive in forcing fsync()'s when it's not really necessary, which is why it's good to at least try to education application programs about something which even you agree shouldn't be a new thing.
(For example, if you don't fsync, and you want to run your application on another OS, like say, Solaris, you will be very sad.)
But it wasn't backside covering, although most people don't seem to realize it, FIRST I added the hueristics to work around the buggy code, and THEN I agitated for people to fix their d*mn code. But application programmers don't like being told that they are wrong, so this seems to be a case of "blame/shoot the messenger" --- with me having been cast into the role of the messenger.