Google Switching To EXT4 Filesystem
An anonymous reader writes "Google is in the process of upgrading their existing EXT2 filesystem to the new and improved EXT4 filesystem. Google has benchmarked three different filesystems — XFS, EXT4 and JFS. In their benchmarking, EXT4 and XFS performed equally well. However, in view of the easier upgrade path from EXT2 to EXT4, Google has decided to go ahead with EXT4."
I guess they didn't consider btrfs ready enough for benchmarking yet.
It's interesting that ReiserFS wasn't even an option here. I myself even ended up using Ext4 when I set up a new box not too long ago. It's a real shame that just because the creator of the filesystem committed a crime, people are drawn to treat the technology itself are somehow dishonored.
The main advantage of EXT3 over EXT2 is that, with journaling, if you ever need to fsck the data, it goes a LOT quicker. It's interesting to note that Google never felt it needed that functionality.
Additionally, I was under the impression that Google used massive numbers of commodity consumer-grade harddrives, as opposed to high-grade stuff which I presume is less likely to err. Couple this fact with the massive amount of data Google is working with and there has got to be a lot of filesystem errors, no?
Can anyone else with experience with big database stuff hint as to why Google would not need to fsck their data (often enough for EXT3 to be worthwhile)? Is it cheaper just to overwrite the data from some backup elsewhere at this scale? How do they know the backup is clean without fscking that?
"A witty saying proves nothing." - Voltaire
I too have abandoned using ReiserFS but it's not about the horrible crime Hans committed. It's about the fact I don't think the company that he owned (who developed ReiserFS) has a great future, so I foresee maintenance problems with that filesystem. Sure, somebody else can continue their work but I'm not going to hold my breath.
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
You say that like it's a good thing. one error, like an assumption in the maximum number of files or clusters causes a wrap round and it all goes tits up.
It's not like they haven't dropped the ball before: http://www.techcrunch.com/2006/12/28/gmail-disaster-reports-of-mass-email-deletions/
Do no evil, but be a bit incompetent sometimes.
I too found it interesting, because it basically alleviates any need for me to worry about "upgrading" to ext4. My current Linux systemse use an ext3 /boot partition and everything else xfs. Given some of the press ext4 has gotten lately, I just trust xfs more, and knowing that I'm not really giving up any performance is a huge plus.
Truthfully though, where the heck are the meta-data based filesystems that we were promised? I've love to be able to, on a filesystem level, instantly pull up a folder view of all videos - or all images. Or all images of my dog. Or all images outdoors. Or all images of my dog outdoors.
Basically, just the ability to organize via an arbitrary number of categorized tags.
"People who think they know everything are very annoying to those of us who do."-Mark Twain
I've used XFS on a RAID1 setup with SATA drives, and found the performance of the delete operation extremely dependent on how the partition was formatted.
I saw times of up to 5 minutes to delete a Linux kernel source tree on a partition that was formatted XFS with the defaults. Have to use something like sunit=64, swidth=64, and even then it takes 5 seconds to rm -rf /usr/src/linux. I've heard that SAS drives wouldn't exhibit this slowness. Under Reiserfs on the same system, the delete took 1 second. Anyway, XFS is notorious for slow delete operations.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
Gee, I hope they're not using Ubuntu 9.10 by any chance: http://www.ubuntu.com/getubuntu/releasenotes/910
The damn bug is STILL not fixed apparently. Some people get the corruption, and some don't. Scares me enough to not even try using ext4 just yet, and I'm still surprised Canonical was stupid enough to have ext4 as the default filesystem in Karmic.
Then again, perhaps Google knows what they're doing.
When does black become white?
#CCCCCC or #888888
Is there overlap with Flamebait?
When does an otherwise 'troll' moderation-worthy comment lose out on status that could validate 19 responses, with 50% scoring +2?
Sometimes a troll is a troll, but sometimes its just a shadow.
Is this why Google was down for about 30 minutes today? Did anyone else even experience this or was it a local issue?
I tried TagFS. And I found the main problem is, that the tagging is way too much work, to get to the level of tagging I want.
Also I avoid XFS, since it keeps huge amounts of (log?) data in RAM. So on a power failure, it’s goodbye data.
XFS is for servers with battery backup. Not for normal home computers.
I also tried JFS, and I got corruption with it. So I avoid it too.
I wish I could use ZFS... especially the scrubbing functionality.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Who gives a fuck about an Oxford comma?
SSD (NAND Flash) is still a block device. In fact, it's even "more" block, insomuch as it requires a filesystem a lot more aware of blocks, their limitations, and the proper way of using them (wear leveling, error correction, etc). It also uses larger blocks and also addresses groups of blocks for certain operations (erase). You either need a Flash-specific filesystem, or a translation to a more typical block device via a flash translation layer (FTL). Furthermore, I'm not aware of a single NAND Flash device that is accessible as memory mapped storage, nor can you run code from NAND, nor do I know of any CPUs capable of booting from NAND (they tend to have built-in ROM bootloaders to do the job). NOR Flash is another matter, but it's not competitive for SSDs. Going from HDDs to SSDs is hardly anything like going to RAM, except for the "solid state" part.
Everything you say is true about Flash, but not about SSDs in general. Flash can be written to one byte at a time, but then it is stuck in that state until it is erased. The circuitry for erasing is bigger than the circuitry for writing, so it is shared among a group of bytes in a cell. These can be any size, but there are trades. The smaller you make them, the more copies of the erase circuit are needed, so the fewer bytes of storage you get per area of die size (and per dollar). The larger you make them, the more you need to erase to modify a single byte. I think most devices use 128KB cells, but I haven't really been paying attention.
Other technologies, such as Magnetic RAM and Phase Change RAM that are starting to hit the market do not have these limitations. The most exciting technology at the moment is Phase Change RAM, which is slightly (about 50%) slower than DRAM, but is non-volatile. You can use it just like RAM, but the contents don't go away when you turn off the power. They're currently at around 64MB, so there's a way to go before they're hard drive replacements, but Flash was at that sort of capacity not long ago.
I am TheRaven on Soylent News
You can use ZFS. Just run FreeBSD or opensolaris. The amount of software that runs on Linux but not FreeBSD (particularly if you're talking about open-source) is exceedingly minimal.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.