Google Switching To EXT4 Filesystem

← Back to Stories (view on slashdot.org)

Google Switching To EXT4 Filesystem

Posted by timothy on Thursday January 14, 2010 @08:50AM from the make-money-with-open-source dept.

An anonymous reader writes "Google is in the process of upgrading their existing EXT2 filesystem to the new and improved EXT4 filesystem. Google has benchmarked three different filesystems — XFS, EXT4 and JFS. In their benchmarking, EXT4 and XFS performed equally well. However, in view of the easier upgrade path from EXT2 to EXT4, Google has decided to go ahead with EXT4."

276 of 348 comments (clear)

Time for a backup? by Itninja · 2010-01-14 08:52 · Score: 5, Informative

I guess now is as good as any to go through my Gmail and Google Docs and make local backups. I'm sure my info is safe, but I have been through these types of 'upgrades' at work before and every once in a while....well, let's just say backups are never a bad idea.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
1. Re:Time for a backup? by fuzzyfuzzyfungus · 2010-01-14 08:54 · Score: 4, Funny
  
  Not to worry. It's all in the cloud, right?
2. Re:Time for a backup? by Anonymous Coward · 2010-01-14 09:01 · Score: 2, Insightful
  
  Oh fuck off. It's not like Google is going to upgrade their entire multiply-redundant infrastructure all at once. And ext4 is a very conservative and stable FS. The "upgrade" process is to simply mount your old ext3 volume as ext4, and let new writes take advantage of ext4 features. If Google is actually still using ext2 rather than ext3, ext4 will be significantly *more* reliable. Not as good as XFS for preserving data integrity, but better than ext2.
3. Re:Time for a backup? by castironpigeon · 2010-01-14 09:02 · Score: 4, Funny
  
  Uh huh, the mushroom cloud.
  
  --
  mmmm...forbidden donut
4. Re:Time for a backup? by Monkeedude1212 · 2010-01-14 09:02 · Score: 1
  
  It sounds like EXT4 is fully compatible with 2 and 3, so even an EXT2 drive can be mounted as EXT4, which means the chances for failure are seriously reduced.
  But I totally hear what you're saying. Whenever you upgrade Anything, nothing is SUPPOSED to go wrong.
  However, It always does.
5. Re:Time for a backup? by spydum · 2010-01-14 09:03 · Score: 1
  
  Actually, they could. It's not like you pay anything for it.
6. Re:Time for a backup? by berashith · 2010-01-14 09:09 · Score: 1
  
  is the beta over yet? I dont give good SLAs on retention and recovery to dev systems .
7. Re:Time for a backup? by Anonymous Coward · 2010-01-14 09:10 · Score: 1, Interesting
  
  The "upgrade" process is to simply mount your old ext3 volume as ext4, and let new writes take advantage of ext4 features.
  You say that like it's a good thing. one error, like an assumption in the maximum number of files or clusters causes a wrap round and it all goes tits up.
  It's not like they haven't dropped the ball before: http://www.techcrunch.com/2006/12/28/gmail-disaster-reports-of-mass-email-deletions/
  Do no evil, but be a bit incompetent sometimes.
8. Re:Time for a backup? by paradigm82 · 2010-01-14 09:16 · Score: 5, Funny
  
  It's probably nothing, probably. But I'm getting a small discrepancy in the file sizes...no, no, it's well within acceptable limits. Continue to stage 2.
9. Re:Time for a backup? by tool462 · 2010-01-14 09:17 · Score: 5, Funny
  
  I usually let the bit-gods decide what data I have that is important enough to save. Over the years the bit-gods have taught me that:
  Music files: not important, Styx crossed the Styx to /dev/null in 2002
  Essay written for sophomore year high school english: Important, I assume to haunt me in some future political race.
  Porn collection: Like the subject matter within, it swells impressively, explodes, then enters a refractory period until it's ready to build up again.
  C++ program that graphs the Mandelbrot set: Important. I like feeling like an explorer navigating the cardioid's canyons.
  Photos of my children: Not important. If I need more baby photos, I can just have more babies.
10. Re:Time for a backup? by at_slashdot · 2010-01-14 09:21 · Score: 2, Insightful
  
  "backups are never a bad idea."
  Depends, for example you reduce the security of data with the number of backups you keep (you could encrypt them but that has it's own problems).
  
  --
  "It is our choices, Harry, that show what we truly are, far more than our abilities." -- Prof. Dumbledore
11. Re:Time for a backup? by BenLeeImp · 2010-01-14 09:25 · Score: 2, Insightful
  
  True, but they do make money off of your data. I'm pretty sure they will go to great lengths to protect their source of revenue.
12. Re:Time for a backup? by Itninja · 2010-01-14 09:28 · Score: 4, Funny
  
  Jeez, calm down junior! No need to open a can of fanboi on me....
  
  --
  I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
13. Re:Time for a backup? by Anonymous Coward · 2010-01-14 09:39 · Score: 5, Funny
  
  Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.
  The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.
  And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.
  My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.
14. Re:Time for a backup? by nemmi · 2010-01-14 09:45 · Score: 1
  
  No. No need to back it up. Google already has a backup. It is called the Dept. Of Justice (DOJ) . They are actually in the same building, but they just want to make sure the "terrorists" haven't made any "illegal searches" before you can have it back.
15. Re:Time for a backup? by Jake+Griffin · 2010-01-14 09:47 · Score: 1
  
  Yes, the beta was over in July, I believe. Or was that a joke?
  
  --
  SIG FAULT: Post index out of bounds.
16. Re:Time for a backup? by Anonymous Coward · 2010-01-14 09:57 · Score: 1, Funny
  
  Music files: not important, Styx crossed the Styx to /dev/null in 2002
  I wish I could mod you up for that line...
17. Re:Time for a backup? by gmuslera · 2010-01-14 09:58 · Score: 1
  
  Data integrity (and replication) is managed in a layer over the fs, so the journaling could be an unneeded hit to the performance. Probably thats why they didnt upgraded to ext3 a long while ago.
18. Re:Time for a backup? by Itninja · 2010-01-14 10:00 · Score: 1
  
  Free loaders? You like GRUB or LILO? I don't get it.
  
  --
  I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
19. Re:Time for a backup? by Lennie · 2010-01-14 10:03 · Score: 1
  
  Their is the problem, I don't trust anyone with my data.
  
  --
  New things are always on the horizon
20. Re:Time for a backup? by icebraining · 2010-01-14 10:03 · Score: 1
  
  http://www.google.com/apps/
  
  --
  Dilbert RSS feed
21. Re:Time for a backup? by SomeJoel · 2010-01-14 10:04 · Score: 1
  
  I'm glad you posted anonymously, sir.
  
  --
  <Complete your profile by adding a signature!>
22. Re:Time for a backup? by lymond01 · 2010-01-14 10:20 · Score: 2, Insightful
  
  If Google is actually still using ext2 rather than ext3, ext4 will be significantly *more* reliable.
  It ain't the destination, it's the journey that worries me.
23. Re:Time for a backup? by ajs · 2010-01-14 10:57 · Score: 1
  
  I guess now is as good as any to go through my Gmail and Google Docs and make local backups. I'm sure my info is safe, but I have been through these types of 'upgrades' at work before and every once in a while....well, let's just say backups are never a bad idea.
  What makes you think that gmail or gdocs is going to be affected? Your data is almost certainly stored in a database. It's possible that that database is stored on a filesystem (as opposed to a raw device, which I won't be at all surprised to see), but even then you're talking about something that's far less discreet than a bunch of text files lying around on a filesystem.
  What's actually kind of amusing is you've never known when or if they've updated that database and yet your life has continued along smoothly.
24. Re:Time for a backup? by mR.bRiGhTsId3 · 2010-01-14 11:08 · Score: 1
  
  I don't think google actually cares about data integrity at the machine level. They have built-in fault tolerance at higher levels of their stack like GFS.
25. Re:Time for a backup? by berashith · 2010-01-14 11:25 · Score: 1
  
  are you asking if my comment a joke, or the declared end of the beta ?
  I was aiming for a snarky cynical cheap shot. twice
26. Re:Time for a backup? by XaXXon · 2010-01-14 11:26 · Score: 3, Informative
  
  Half life.
27. Re:Time for a backup? by Itninja · 2010-01-14 11:30 · Score: 1
  
  I think what you mean are 'they are likely safer...'. Unless you have some hard data, I hesitate to think Googles' backups are any safer than mine. They have no vested interest in keeping my poetry and email thread with my sister safe, whereas I do.
  
  --
  I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
28. Re:Time for a backup? by pz · 2010-01-14 11:59 · Score: 3, Funny
  
  Based on the movie 2001:
  HAL: "Sorry about this, I know it's a bit silly...just a moment...just a moment... I've just picked up a fault in the AE35 unit. It's going to go 100% failure within 72 hours."
  Dave:"It's still within operational limits right now?"
  HAL:"Yes. And it will stay that way till it fails."
  I don't have my copy of the book handy to check the original dialogue.
  
  --
  
  Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
29. Re:Time for a backup? by Conditioner · 2010-01-14 12:19 · Score: 1
  
  Half-Life
30. Re:Time for a backup? by Ginger+Unicorn · 2010-01-14 13:46 · Score: 2, Informative
  Presumably you're a troll, since the link that you gave explicitly states the following:
  
  It was a firefox exploit, not a problem with google's servers
  Only 60 people were affected. "mass email deletions" indeed.
  --
  (1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons
31. Re:Time for a backup? by fostware · 2010-01-14 13:55 · Score: 1
  
  I usually let the bit-gods decide what data I have that is important enough to save. Over the years the bit-gods have taught me that:
  snip>
  Photos of my children: Not important. If I need more baby photos, I can just have more babies.
  Let me know how well "sudo make me a baby" (xkcd style) works out for you?
  
  --
  "We know what happens to people who stay in the middle of the road. They get run over." - Aneurin Bevan
32. Re:Time for a backup? by symbolset · 2010-01-14 14:37 · Score: 1
  
  Even Microsoft wouldn't do that. They would be in danger of losing the data.
  /ducks, runs.
  
  --
  Help stamp out iliturcy.
33. Re:Time for a backup? by BrokenHalo · 2010-01-14 15:31 · Score: 1
  
  Your link, while interesting, appears to have nothing to do with filesystems. Ext4 might be numerically more advanced than ext2, but the latter still has a useful place in the scheme of things if we accept the proviso that proper steps are taken to secure the data. I still use ext2 for boot partitions on general-purpose Linux boxes. I mount these read-only in the interests of security, but that means, of course, that I can't have journalling on them, which precludes the use of ext3 or 4.
  
  Google probably doesn't have a requirement for read-only partitions on machines that have only one purpose, so the uptime benefits are a more compelling factor.
34. Re:Time for a backup? by bennomatic · 2010-01-14 15:41 · Score: 1
  
  Did you AC troll yourself for the sake of a punch line?
  
  --
  The CB App. What's your 20?
35. Re:Time for a backup? by Itninja · 2010-01-14 16:25 · Score: 1
  
  No, but that would have been brilliant! Next time...
  
  --
  I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
36. Re:Time for a backup? by mlts · 2010-01-14 17:15 · Score: 1
  
  If you have decent encryption, the number of backups doesn't matter. When I mean decent, I mean using a known good program. On the commercial Windows front, Retrospect and Backup Exec both have government certified [1] AES library implementations. For large hard disks, depending on OS, there is TrueCrypt, OS X's Disk Utility, EncFS, BitLocker To Go, and LUKS. For tapes, HP sells tape drives (LTO4s) that use AES encryption in hardware [2].
  With all the encryption options, having multiple backups isn't going to reduce the security by any real amount, assuming the keys are protected and stored well.
  [1]: Certified can be argued to not mean secure, but it means that the company paid for a third party to look at the AES implementation and confirm it meets standards.
  [2]: With tape drives that support SPIN/SPOUT functionality, you can manually set a key for every tape, or let backup software handle the key management for you.
37. Re:Time for a backup? by tytso · 2010-01-14 17:18 · Score: 2, Informative
  
  >I mount these read-only in the interests of security, but that means, of course,
  >that I can't have journalling on them, which precludes the use of ext3 or 4.
  #1. you can mount ext3 file systems read-only. The journal doesn't preclude a ro mount.
  #2. ext4 supports running without a journal. Google engineers contributed that code to ext4 last year.
38. Re:Time for a backup? by Basje · 2010-01-14 21:13 · Score: 1
  
  Yup, but when it rains it pours. Then the cloud is down the drain.
  
  --
  the pun is mightier than the sword
39. Re:Time for a backup? by vegiVamp · 2010-01-14 22:52 · Score: 1
  
  A comment very similar to this one has appeared on slashdot cloud articles before - almost verbatim, I'd say. Can't help but wonder if you're one of those losers who keep logs of comments they like so they can copy/paste them later.
  
  --
  What a depressingly stupid machine.
40. Re:Time for a backup? by Fred_A · 2010-01-14 23:28 · Score: 1
  
  Not to worry. It's all in the cloud, right?
  The trouble with stuff that's in clouds is that sometimes it rains...
  
  --
  
  May contain traces of nut.
  Made from the freshest electrons.
41. Re:Time for a backup? by naglep · 2010-01-15 00:02 · Score: 4, Funny
  
  A comment very very similar to this one has appeared on slashdot cloud articles before - almost verbatim, I'd say. Can't help but wonder if you're one of those losers who keep logs of comments they like so they can copy/paste them later.
42. Re:Time for a backup? by Simetrical · 2010-01-15 05:22 · Score: 1
  
  If Google is actually still using ext2 rather than ext3, ext4 will be significantly *more* reliable.
  They don't care. I can't remember where I read it, but I read that they were using ext2 since they have no reason to use journaling – if a machine crashes, they just reimage it. GFS ensures that everything is copied to multiple nodes, maybe even in physically disparate locations, so there's no need for recovery (such as via journaling) of individual nodes that have failed. ext2 is just ext3 with journaling disabled.
  What Google wants, as the summary suggests, is performance, and ext4 will certainly provide that compared to ext2/3.
  
  --
  MediaWiki developer, Total War Center sysadmin
43. Re:Time for a backup? by tool462 · 2010-01-15 08:19 · Score: 1
  
  This one (http://xkcd.com/387/) was involved in the discussion when we decided to have kids.
44. Re:Time for a backup? by equivocal · 2010-01-15 08:53 · Score: 1
  
  Ask Tynt. They know what people have been copying/pasting.
45. Re:Time for a backup? by BrokenHalo · 2010-01-16 04:43 · Score: 1
  
  I'm aware that ext4 can run without a journal, but isn't that functionally equivalent to leaving it as ext2?
46. Re:Time for a backup? by RichiH · 2010-01-16 05:13 · Score: 1
  
  > It ain't the destination, it's the journey that worries me.
  Are you aware of how such an upgrade works?
  PS: XFS is better, anyway :p
47. Re:Time for a backup? by tytso · 2010-01-17 03:40 · Score: 1
  
  I'm aware that ext4 can run without a journal, but isn't that functionally equivalent to leaving it as ext2?
  With ext4 you get the benefits of extents, delayed allocation, and other new-to-ext4 features. You also get directory hash trees, which was introduced in ext3 and therefore not in ext2. Running with out the journal means you have to run a full fsck after an unclean shutdown, but you still get all of the new features and performance improvements of ext4.
48. Re:Time for a backup? by bennomatic · 2010-01-23 19:26 · Score: 1
  
  Glad you appreciated the idea.
  
  --
  The CB App. What's your 20?
Slashdotted already ? by ccandreva · 2010-01-14 08:54 · Score: 1

Looks like Digitizor already melted.
1. Re:Slashdotted already ? by lalena · 2010-01-14 08:57 · Score: 1
  
  Yeah, it was down by the time there were 2 posts in /.
2. Re:Slashdotted already ? by spazdor · 2010-01-14 08:58 · Score: 1
  
  Must be all that journalizing the webserver's gotta do.
  
  --
  DRM: Terminator crops for your mind!
3. Re:Slashdotted already ? by Anonymous Coward · 2010-01-14 09:00 · Score: 1, Informative
  
  Phoronix has the story
  http://www.phoronix.com/scan.php?page=news_item&px=Nzg4MA
Use of commas. by Anonymous Coward · 2010-01-14 08:57 · Score: 4, Funny

Eats, shoots and leaves. Read it.
1. Re:Use of commas. by schon · 2010-01-14 09:03 · Score: 4, Funny
  
  Maybe it was submitted by William Shatner?
2. Re:Use of commas. by Em+Emalb · 2010-01-14 09:09 · Score: 1, Insightful
  
  Why do I put a comma before the and in a list?
  I would say "I have a cat, a dog, and two goats."
  But you would say "I have a cat, a dog and two goats." (Then you'd bugger the goats, but that's how you roll.)
  The English language is so damned weird...but AC is right, illegal use of commas. That's a 15 karma penalty. 1st down.
  
  --
  Sent from your iPad.
3. Re:Use of commas. by natehoy · 2010-01-14 09:09 · Score: 1
  
  Nope, that can't be it. There aren't any exclamation points.
  
  --
  "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
4. Re:Use of commas. by Darth+Sdlavrot · 2010-01-14 09:29 · Score: 2, Informative
  
  Why do I put a comma before the and in a list?
  I would say "I have a cat, a dog, and two goats."
  But you would say "I have a cat, a dog and two goats."
  The English language is so damned weird...but AC is right, illegal use of commas. That's a 15 karma penalty. 1st down.
  I too add the comma in lists of discrete items -- not sure where I learned it.
  If some items are connected or related in some way that's distinct from the other items in the list I'd omit the comma. Not a great example: "I have a cat, a daughter and a son, a car and a motorcycle, and a swimming pool."
  I notice that the Brits (and Canucks, Aussies, etc., tend to always omit the comma.
  Could be an Americanism?
  (And I suspect you really write it, not "say" it.)
5. Re:Use of commas. by AvitarX · 2010-01-14 09:33 · Score: 2, Informative
  
  There is no hard rule on this, and both can be ambiguous in different circumstances.
  http://en.wikipedia.org/wiki/Serial_comma
  
  --
  Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
6. Re:Use of commas. by Lennie · 2010-01-14 10:06 · Score: 1
  
  I always thought that punctuation like a ',' is (among other things) like a pause in speech. If you look at it like that, you can add a ',' anywhere you want to pause. You can pause before the 'and'.
  
  --
  New things are always on the horizon
7. Re:Use of commas. by SomeJoel · 2010-01-14 10:07 · Score: 1
  
  That's a 15 karma penalty. 1st down.
  A defensive penalty?! You've, got to be, joking.
  
  --
  <Complete your profile by adding a signature!>
8. Re:Use of commas. by dloose · 2010-01-14 10:10 · Score: 1, Interesting
  
  Who gives a fuck about an Oxford comma?
9. Re:Use of commas. by quickOnTheUptake · 2010-01-14 11:30 · Score: 1
  
  Could be an Americanism?
  It's called the "Oxford comma" so I so I suspect not.
  
  --
  Mod points: Guaranteed to remove your sense of humor.
  Side effects may include gullibility and temporary retardation
10. Re:Use of commas. by quickOnTheUptake · 2010-01-14 11:50 · Score: 1
  
  I should add that there are many cases where contemporary American use is as old as or older than the contemporary British use.
  Others would be the preservation of 'gotten' ("He's gotten much better.") and certain uses of the subjective (e.g., "He insisted that he be given . . .").
  
  --
  Mod points: Guaranteed to remove your sense of humor.
  Side effects may include gullibility and temporary retardation
11. Re:Use of commas. by ttldkns · 2010-01-14 11:52 · Score: 1
  
  http://en.wikipedia.org/wiki/Serial_comma
  while technically incorrect usage and shunned by many academics I've met, as a computer programmer it sits better with me to have each term in a list or array of objects accurately comma delimited. It seems stupid to me to rely on re-arranging a list because of the ambiguity an and term can create.
  
  --
  How many computers are too many?
12. Re:Use of commas. by Nutria · 2010-01-14 13:34 · Score: 1
  
  The car comes in several colours: white, red, green, black and white.
  If you're going to lecture us on grammar, don't make obvious grammatical mistakes. A colon goes after "colours".
  
  --
  "I don't know, therefore Aliens" Wafflebox1
13. Re:Use of commas. by Xabraxas · 2010-01-14 14:44 · Score: 1
  
  Actually both ways are now accepted, although the former was not considered correct when I was in grade school.
  
  --
  Time makes more converts than reason
14. Re:Use of commas. by cdrudge · 2010-01-14 16:25 · Score: 1
  
  I notice that the Brits (and Canucks, Aussies, etc., tend to always omit the comma.
  I'm pretty sure though that all flavors of English tend use parentheses in pairs. ;)
15. Re:Use of commas. by osu-neko · 2010-01-14 22:55 · Score: 1
  
  Actually both ways are now accepted, although the former was not considered correct when I was in grade school.
  Or, more accurately, it was considered correct when you were in grade school, but not by your grade school teacher.
  
  --
  "Convictions are more dangerous enemies of truth than lies."
16. Re:Use of commas. by vegiVamp · 2010-01-14 22:55 · Score: 1
  
  I suspect the comma comes from the natural pause that also occurs in speech after every item, and thus also right before the "and finalitem". It's another speaking habit filtering through into writing.
  
  --
  What a depressingly stupid machine.
17. Re:Use of commas. by bluefoxlucid · 2010-01-15 04:57 · Score: 1
  
  Dude, it's data organization. Do you want spam, fries, and eggs? Or spam fries and eggs? Or spam fries, sausage, and eggs? For that matter, you can have eggs and bacon; eggs, sausage, and bacon; or eggs, sausage, biscuits, grits, and shredded potatoes.
  
  --
  Support my political activism on Patreon.
18. Re:Use of commas. by Darth+Sdlavrot · 2010-01-15 05:02 · Score: 1
  
  Indeed. You have a keen eye for the obvious.
  Normally that's my job, but I'm happy to delegate.
19. Re:Use of commas. by vegiVamp · 2010-01-15 05:14 · Score: 1
  
  No, I want spam, fries and eggs. The use of "and" obviates the need for the comma as a list separator.
  
  --
  What a depressingly stupid machine.
20. Re:Use of commas. by rezza · 2010-01-15 06:39 · Score: 1
  
  "On the menu today we have pasta, steak and chips and custard."
  vs
  "On the menu today we have pasta, steak and chips, and custard."
21. Re:Use of commas. by vegiVamp · 2010-01-19 02:20 · Score: 1
  
  Special case: it's required for disambiguation.
  
  --
  What a depressingly stupid machine.
Not A Nerd? by TheNinjaroach · 2010-01-14 08:58 · Score: 2, Insightful

News for nerds. Stuff that matters.

Not that I RTFA or anything, but I find it interesting that XFS and EXT4 both appear to be equally impressive with benchmarks, and it's implied they are both better than JFS. You must not be a nerd.

--
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
1. Re:Not A Nerd? by MBGMorden · 2010-01-14 09:11 · Score: 3, Interesting
  
  I too found it interesting, because it basically alleviates any need for me to worry about "upgrading" to ext4. My current Linux systemse use an ext3 /boot partition and everything else xfs. Given some of the press ext4 has gotten lately, I just trust xfs more, and knowing that I'm not really giving up any performance is a huge plus.
  Truthfully though, where the heck are the meta-data based filesystems that we were promised? I've love to be able to, on a filesystem level, instantly pull up a folder view of all videos - or all images. Or all images of my dog. Or all images outdoors. Or all images of my dog outdoors.
  Basically, just the ability to organize via an arbitrary number of categorized tags.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
2. Re:Not A Nerd? by gazbo · 2010-01-14 09:19 · Score: 1
  
  As another home user I too find it illuminating which FS benchmarks best for Google's workload.
3. Re:Not A Nerd? by Hurricane78 · 2010-01-14 10:05 · Score: 2, Interesting
  
  I tried TagFS. And I found the main problem is, that the tagging is way too much work, to get to the level of tagging I want.
  Also I avoid XFS, since it keeps huge amounts of (log?) data in RAM. So on a power failure, it’s goodbye data.
  XFS is for servers with battery backup. Not for normal home computers.
  I also tried JFS, and I got corruption with it. So I avoid it too.
  I wish I could use ZFS... especially the scrubbing functionality.
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
4. Re:Not A Nerd? by mlts · 2010-01-14 10:05 · Score: 1
  
  I'm sticking with ext3 because it has been tried and true, with few reports of data loss due to the filesystem. The only filesystem which I'd upgrade to would be btrfs once that becomes production ready.
  Meta tags would be nice to have, I agree there. Another thing I'd love to have is where the filesystem stored a SHA-256 or SHA-512 hash of files. This would be excellent for backups because all the backup program would have to do for deduplication would essentially be to pull the hashes, take the first file if there are multiple with the same hash. No need to guess if a file is changed by the mtime.
5. Re:Not A Nerd? by IdleTime · 2010-01-14 10:27 · Score: 1
  
  The filesystem they should have used, is btrfs, http://btrfs.wiki.kernel.org/index.php/Main_Page
  
  But I guess it's a bit too early yet. I'm running it on a 1Tb without any problems and it is F-A-S-T.
  
  --
  If you mod me down, I *will* introduce you to my sister!
6. Re:Not A Nerd? by icebraining · 2010-01-14 10:31 · Score: 1
  
  That tag fs shouldn't be built in this kind of fses, but as a layer on top.
  There's Trackerfs, that uses Tracker:
  
  Trackerfs is a FUSE module that connects to a running Tracker interface via DBus and populates a directory with symlinks corresponding to a Tracker query. The name of a directory will determine the query, e.g. 'home' will contain symlinks to the results of the Tracker query for 'home'. Right now, Trackerfs is limited to one query per filesystem.
  http://code.google.com/p/trackerfs/
  
  --
  Dilbert RSS feed
7. Re:Not A Nerd? by Anonymous Coward · 2010-01-14 10:36 · Score: 1, Funny
  
  or all images. Or all images of my dog. Or all images outdoors. Or all images of my dog outdoors.
  Seems like you have a different kind of porn to everyone else...
8. Re:Not A Nerd? by BikeHelmet · 2010-01-14 10:45 · Score: 1
  
  There's file searching apps that can do this.
  They have to maintain DBs of tags for every file. I'm not sure how they cope with files being moved. Maybe identically named files get hashed to see if the tag still applies?
  Hmm... this is starting to sound like it might bog down the FS if it applied to hundreds of thousands of files. Even a simple file searcher takes minutes to run on a modern HDD. I can't imagine how long hashing everything would take. But it'd probably be really fast when searching.
  What I've started doing is treating subfolders as tags. I can search out "Family Holiday 2009 whale" and "F:\Family\Pictures\Holidays\2009 trip\Whale_001.jpg" will come up.
9. Re:Not A Nerd? by aliquis · 2010-01-14 10:50 · Score: 1
  
  Maybe he's a mac nerd?
  Yeah, I know it's an oxymoron.
10. Re:Not A Nerd? by jlund · 2010-01-14 10:51 · Score: 2, Informative
  
  Truthfully though, where the heck are the meta-data based filesystems that we were promised? I've love to be able to, on a filesystem level, instantly pull up a folder view of all videos - or all images. Or all images of my dog. Or all images outdoors. Or all images of my dog outdoors.
  Basically, just the ability to organize via an arbitrary number of categorized tags.
  You must be referring to WinFS... Oh wait, it's never shipped, but is in development.
11. Re:Not A Nerd? by Archangel+Michael · 2010-01-14 11:32 · Score: 2, Informative
  
  Truthfully though, where the heck are the meta-data based filesystems that we were promised
  I suspect that once we get over the BLOCK LEVEL DEVICE (BLD) paradigm, and into SSDs that are NOT mimicking BLD, we'll have something closer to what you want.
  The problem with moving from BLD, is that we've been using them for so long that I'm not sure there is any good way to make the switch to straight linear addressing of memory for ALL storage.
  In fact, I would suspect that our idea of "booting" is necessarily going to have to change, from BLD bootstrap to just doing a memory move from slower to faster memory (SSD to RAM to Level 3, 2 and on die Cache).
  It is going to need a different way of looking at how we use storage from near to far off, from slow to fast(er)
  We're gonna have to index memory somehow, and track the bits.
  
  --
  Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
12. Re:Not A Nerd? by marcansoft · 2010-01-14 11:51 · Score: 2, Interesting
  
  SSD (NAND Flash) is still a block device. In fact, it's even "more" block, insomuch as it requires a filesystem a lot more aware of blocks, their limitations, and the proper way of using them (wear leveling, error correction, etc). It also uses larger blocks and also addresses groups of blocks for certain operations (erase). You either need a Flash-specific filesystem, or a translation to a more typical block device via a flash translation layer (FTL). Furthermore, I'm not aware of a single NAND Flash device that is accessible as memory mapped storage, nor can you run code from NAND, nor do I know of any CPUs capable of booting from NAND (they tend to have built-in ROM bootloaders to do the job). NOR Flash is another matter, but it's not competitive for SSDs. Going from HDDs to SSDs is hardly anything like going to RAM, except for the "solid state" part.
13. Re:Not A Nerd? by TheRaven64 · 2010-01-14 12:16 · Score: 2, Interesting
  
  Everything you say is true about Flash, but not about SSDs in general. Flash can be written to one byte at a time, but then it is stuck in that state until it is erased. The circuitry for erasing is bigger than the circuitry for writing, so it is shared among a group of bytes in a cell. These can be any size, but there are trades. The smaller you make them, the more copies of the erase circuit are needed, so the fewer bytes of storage you get per area of die size (and per dollar). The larger you make them, the more you need to erase to modify a single byte. I think most devices use 128KB cells, but I haven't really been paying attention.
  Other technologies, such as Magnetic RAM and Phase Change RAM that are starting to hit the market do not have these limitations. The most exciting technology at the moment is Phase Change RAM, which is slightly (about 50%) slower than DRAM, but is non-volatile. You can use it just like RAM, but the contents don't go away when you turn off the power. They're currently at around 64MB, so there's a way to go before they're hard drive replacements, but Flash was at that sort of capacity not long ago.
  
  --
  I am TheRaven on Soylent News
14. Re:Not A Nerd? by smash · 2010-01-14 12:18 · Score: 3, Interesting
  
  You can use ZFS. Just run FreeBSD or opensolaris. The amount of software that runs on Linux but not FreeBSD (particularly if you're talking about open-source) is exceedingly minimal.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
15. Re:Not A Nerd? by Jurily · 2010-01-14 12:48 · Score: 1
  
  Truthfully though, where the heck are the meta-data based filesystems that we were promised?
  We've been writing software for tree-based filesystems for 40 years, and unless you want to port all those, the best we could do is a hybrid. Yeah, it'd be nice for holiday pictures, but what about system files?
16. Re:Not A Nerd? by marcansoft · 2010-01-14 12:53 · Score: 1
  
  Modern NAND cannot be written one byte at a time. You can only write full pages (that's the Flash term for a block, usually 2K or so). For MLC NAND Flash (most common these days, as it has higher density) you can only do this write once between erases, so you're stuck writing 2K at a time. For SLC NAND, you can write each page multiple times (usually 4 or so) between erases, though of course you can only flip bits from 1 to 0, not vice versa.
  It is true that other technologies behave more like RAM, but so far none of them are viable for what we call SSDs today. This may change in the future. My comment was about current SSDs.
17. Re:Not A Nerd? by budgenator · 2010-01-14 13:10 · Score: 1
  
  I'm not sure it makes any real difference, isn't an EXT3 /boot partition is read as EXT2 on booting and then almost all of the rest of the time it's not written too?
  
  --
  Apocalypse Cancelled, Sorry, No Ticket Refunds
18. Re:Not A Nerd? by MichaelSmith · 2010-01-14 14:22 · Score: 1
  
  I installed Ubuntu (I think 9.04 or 8.10) on my work machine and proceeded to do some version control hacking which involved creating a very large number of files on ext4. The file system ran out of inodes when about 40% of disk space had been used. I think it may have been partly a configuration issue, but I think it is better if our ancestors (so to speak) can make the fatal mistakes.
  
  --
  http://michaelsmith.id.au
19. Re:Not A Nerd? by opposabledumbs · 2010-01-14 14:34 · Score: 1
  
  Or just a different term for the protagonist in said pr0n...
20. Re:Not A Nerd? by chromas · 2010-01-14 14:44 · Score: 1
  
  A filesystem is a database.
21. Re:Not A Nerd? by smash · 2010-01-14 15:15 · Score: 1
  
  Yeah, pre-beta software that has not been proven on any real world workload sounds like just the ticket!
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
22. Re:Not A Nerd? by MBGMorden · 2010-01-14 16:22 · Score: 1
  
  Make a a compatibility layer that defines a tag category "directory" and store the traditional directory tree position there. Legacy apps could then navigate that way.
  For system files and newer apps, just have a file type attribute that is set to "system". Further tags could further organize them (for instance, files could be tagged "system" and "executable", or "system" and "configuration").
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
23. Re:Not A Nerd? by Jurily · 2010-01-14 16:37 · Score: 1
  
  And "system, bin", "system, sbin", "system, /usr/bin", "system, /usr/sbin"... How would this be better than a tree-based structure again?
  Nevermind the fact that we could actually do all the things that are nice in a tag-based system right now. Just think about symlinks for a second.
24. Re:Not A Nerd? by mlts · 2010-01-14 16:40 · Score: 1
  
  I'm pretty sure they would likely move to it, or seriously consider a move to it once it becomes stable, perhaps a little bit after that for hidden gotchas to be found and squashed. Even now, in the unstable/testing phase, it has remarkable performance and stability features. I can't wait until it gets into the stable phase.
25. Re:Not A Nerd? by MBGMorden · 2010-01-14 17:29 · Score: 1
  
  It's not better for system files - it's just a way to model the old system for compatibility reason. For data files though, it would be revolutionary.
  As to symlinks, it doesn't work as well. I actually do this in a limited fashion on my system, but the reality is you'd have to create dummy directories and pre-think every possible combination of criteria you'd want ahead of time and create links for it. It'd be a literal mess.
  Consider smart playlists in iTunes. I can immediately create a list that says give me all songs from the 1990's, by the band "Ooga Booga", in the Soundtrack genre. It'll create that immediately - AND it'll keep it up to date so long as the files have accurate metadata. It works GREAT. The problem though is it is app specific, limited to the keywords they have defined for you, and only applies to music. That type of capability built into the filesystem, usable on any app, it would be wonderful.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
26. Re:Not A Nerd? by Anonymous Coward · 2010-01-14 18:11 · Score: 1, Insightful
  
  It's called Nepomuk, you'll find it in KDE4.
  The big problem with tagging is that it is essentially useless since you are going to have to tag every file yourself. Nepomuk scrapes text documents and even code files but music/video/photos will just get you the filename, the contents of the EXIF/ID3 tags and that's it. No-one has that sort of patience, at least, if they have any sort of sizable collection.
27. Re:Not A Nerd? by moosesocks · 2010-01-14 19:08 · Score: 1
  
  ZFS can also run inside a FUSE module on linux. I use it for managing my NAS and backup pools.
  The performance isn't great, although it's perfectly adequate for my needs -- having the awesome volume-management capabilities are more than a worthwhile tradeoff. Sun's continually making improvements to ZFS, while the ZFS-fuse team have been working on the performance angle.
  Word has it that a private company is also working on a cleanroom implementation of ZFS for the Linux kernel, which should be free of licensing issues. (Of course, one could question the necessity of this effort, as Btrfs should have most of the features that make ZFS desirable by the time it's done)
  
  --
  -- If you try to fail and succeed, which have you done? - Uli's moose
28. Re:Not A Nerd? by Jurily · 2010-01-14 20:50 · Score: 1
  
  so long as the files have accurate metadata.
  Ever had an mp3 player that forced the tag structure on you while your songs weren't perfectly tagged?
  Seriously, show me one player that knows "Tupac Shakur", "Tupac", "2Pac", and "2pac" are the same artist, or one that can figure out if there are two artists in the tag.
29. Re:Not A Nerd? by cc1984_ · 2010-01-14 21:30 · Score: 1
  
  I'm not sure it makes any real difference, isn't an EXT3 /boot partition is read as EXT2 on booting and then almost all of the rest of the time it's not written too?
  EXT3 takes up more space on the disk because of its metadata store.
30. Re:Not A Nerd? by X0563511 · 2010-01-14 22:04 · Score: 1
  
  I've been using XFS at home for about a year now. I've got a UPS, but it only lasts about 5 minutes. It's enough that little blips from stuff turning on/off don't bother it.
  No issues yet.
  
  --
  For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
31. Re:Not A Nerd? by Fred_A · 2010-01-14 23:00 · Score: 1
  
  I tried TagFS. And I found the main problem is, that the tagging is way too much work, to get to the level of tagging I want.
  While tagging has always been (and presumably always will be) the major hurdle in document management, some types of documents are self-tagged.
  For example photos already carry an amount of meta-data through EXIF (and now XMP). So do a number of media files (music mostly). So those can be auto-indexed fairly easily. So can any data that holds explicit information (text), more or less.
  If this was easily made available in a database thingy at the filesystem level instead of at the application level, we could not only avoid the multiple instances of the data (only having it once in the file and once in the file system) instead of once in every app that uses the data (which is usually done for to speed up data retrieval).
  Anyway knowing that Ext4 is good enough for Google is a good step forward (although I'll have to see what options they used) in our never ending quest towards data preservation.
  
  --
  
  May contain traces of nut.
  Made from the freshest electrons.
32. Re:Not A Nerd? by drinkypoo · 2010-01-14 23:45 · Score: 1
  
  you could do this today with current filesystems which support acls, xattrs, and metadata, but there are no tools to handle the user side. just like selinux, which was supposed to usher in a new age of linux security.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
33. Re:Not A Nerd? by tenco · 2010-01-14 23:46 · Score: 1
  
  Lately they're getting more and more. It's awful.
34. Re:Not A Nerd? by drseuk · 2010-01-14 23:54 · Score: 1
  
  Ah, you'll be needing our DFS [Dog File System] then which is highly optimised for exactly such queries - unfortunately we lost the source during a spate of flying chairs from a well-known UK furniture retailer.
35. Re:Not A Nerd? by gmack · 2010-01-15 00:33 · Score: 1
  
  That works because iTunes recognizes the metadata in the file format it works with. Would you teach the OS to recognize file formats and bloat the OS or would you imbed the metadata in the FS and make it harder to keep the metadata during file transfers?
  Thing is Linux already had such a thing fully implemented but Linus nixed it based on the idea being a bad one.
36. Re:Not A Nerd? by MBGMorden · 2010-01-15 01:11 · Score: 1
  
  Seriously, show me one player that knows "Tupac Shakur", "Tupac", "2Pac", and "2pac" are the same artist, or one that can figure out if there are two artists in the tag.
  Seriously not a problem for me as I specifically have made it a point to keep the metadata clean on my MP3's (and adding album artwork). If I notice that a song uses a different variation or spelling than everything else for 1 field then I'll simply edit the metadata.
  Not being able to figure out of there's two artists in one tag is a weakness of the system as designed - rather than being a field for a single value the system SHOULD be designed to hold a list of values in each field. Unfortunately MP3's weren't designed that way (it could likely be simulated, but without it being in the actual spec getting everyone to do it the same way would be almost impossible), but if we were redoing a new metadata system from the ground up up then that could be easily implemented.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
37. Re:Not A Nerd? by MBGMorden · 2010-01-15 01:15 · Score: 1
  
  Personally I wouldn't have a problem keeping it at a filesystem level locally. When I download a file I already choose what directory it goes into - assigning metadata to me is much the same.
  As to it already having been implemented, I certainly haven't seen anything, and regardless, Linus doesn't quite hold the mystical reality beam over the Linux populace that Jobs does of the the Mac - if he says something is a bad idea I certainly don't mind disagreeing with him.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
38. Re:Not A Nerd? by MBGMorden · 2010-01-15 01:17 · Score: 1
  
  I'll check that out. Personally I don't mind manual tagging so long as it lets me assign tags to multiple files at once and the tags follow the file after I move it.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
39. Re:Not A Nerd? by marcosdumay · 2010-01-15 07:42 · Score: 1
  
  Ext3 can do that, it is just that applications don't support it.
  
  --
  Rethinking email
40. Re:Not A Nerd? by RichiH · 2010-01-16 05:52 · Score: 1
  
  Maybe he needs the drivers, prefers the GNU stack or wants a particular packaging system?
Digitzor link uesless by autocracy · 2010-01-14 08:58 · Score: 5, Informative

I managed to ease a pageview out of it. That said, the /. summary says all they say, and you're all better served by the source they point to, which is what SHOULD have been in the article summary instead of the Digitzor site.
See http://lists.openwall.net/linux-ext4/2010/01/04/8

--
SIG: HUP
1. Re:Digitzor link uesless by ShadowRangerRIT · 2010-01-14 09:01 · Score: 1
  
  Mod parent Informative please. It's a good link, particularly with the /.ing of the original article link.
  
  --
  $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Ted T'so by RPoet · 2010-01-14 08:59 · Score: 4, Informative

They have Ted T'so of Linux filesystem fame working for them now.

--
"Oppression and harassment is a small price to pay to live in the land of the free." -- Montgomery Burns.
1. Re:Ted T'so by Don_dumb · 2010-01-15 01:20 · Score: 1
  
  And he's posted to this discussion in several places (tytso (63275)).
  So for once on /., real information and insight.
  
  --
  If this were really happening, what would you think?
Btrfs? by Wonko+the+Sane · 2010-01-14 09:00 · Score: 2, Interesting

I guess they didn't consider btrfs ready enough for benchmarking yet.
1. Re:Btrfs? by fuzzyfuzzyfungus · 2010-01-14 09:10 · Score: 2, Funny
  
  I wonder if oracle is really bttr about their rejection?
2. Re:Btrfs? by Paradigm_Complex · 2010-01-14 09:11 · Score: 5, Informative
  
  From kernel.org's BTRFS page:
  
  Btrfs is under heavy development, and is not suitable for any uses other than benchmarking and review. The Btrfs disk format is not yet finalized, but it will only be changed if a critical bug is found and no workarounds are possible.
  It's ready for benchmarking, it's just not ready for widespread use yet. If Google was looking for a filesystem to make a switch to in the near future, BTRFS simply isn't an option quite yet.
  
  It's really easy at this point to move from EXT2 to EXT4 (I believe you can simply remount the partition as the new filesystem, maybe change a flag or two, and away you go). It's basically free performance. If Google is convinced it's stable, there isn't much reason not to do this. It could act as an interim filesystem until something significantly better - such as BTRFS - gets to the point where it's dependable. The fact BTRFS was not mentioned here doesn't mean it's completely ruled out.
  
  --
  "A witty saying proves nothing." - Voltaire
3. Re:Btrfs? by Tubal-Cain · 2010-01-14 09:17 · Score: 2, Insightful
  
  The chances of them using it would be pretty much nil. They are switching from ext2, and ext4's been "done" for over a year now. I'm sure they have a few benchmarks of btrfs, just not on as large of a scale as these tests were.
4. Re:Btrfs? by Anonymous Coward · 2010-01-14 09:37 · Score: 1, Interesting
  
  Ext3 is just a couple flags added to ext2. For ext4, if you want to take advantage of its features, you have to start from scratch. However, I don't think this is an issue for Google, as they have a ton of redundancy.
5. Re:Btrfs? by Korin43 · 2010-01-14 09:42 · Score: 1
  
  It sounds like just mounting an ext2 partition at ext4 should give some performance increase, but it won't be able to use extents, which are apparently a big deal.
6. Re:Btrfs? by StarHeart · 2010-01-14 09:48 · Score: 4, Informative
  
  You don't have to start from scratch. You just have to enable the extents feature. It won't auto convert the old stuff, but any time something is changed it will be made into an extent.
  
  --
  Havoc Penington, the bane of my Linux desktop.
7. Re:Btrfs? by Lennie · 2010-01-14 10:11 · Score: 1
  
  If they choose for the ext-family upgrade path, btrfs is also still possible in the future. You can even do an inplace upgrade from ext2, 3 (and probably 4, but I didn't see it in the text where I read about this feature) to btrfs.
  
  Not that it matters, I'm fairly sure they don't do inplace upgrades. Atleast with ext4, if you want to benefit the most from performance and features, if I remember correctly, you should do a new filesystem, not an inplace upgrade.
  
  --
  New things are always on the horizon
8. Re:Btrfs? by GooberToo · 2010-01-14 10:38 · Score: 1
  
  That's right. Remounting ex2/3 as ext4 does not provide all the performance benefits. To truly gain the performance boost, you must format at ext4.
  Also, when mount ext2/3 as ext4, depending on the mount options, you many not be able to roll back to ext2/3 if you don't like how things go via the ext4 mount experiment.
9. Re:Btrfs? by shish · 2010-01-14 11:04 · Score: 1
  
  Btrfs does have a giant list of really cool features; but from what I've seen of google's needs, they're at the complete opposite end of the spectrum (I'm surprised that they're using a filesystem at all, when they could just dump their data structure on the raw disk)
  
  --
  I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
10. Re:Btrfs? by dbIII · 2010-01-14 12:00 · Score: 1
  
  It's simple - copy off
  format
  copy back
  bummer!
  format again
  copy back
11. Re:Btrfs? by complete+loony · 2010-01-14 12:47 · Score: 1
  
  From google's point of view it only has to be stable enough. They don't care that much if a node goes down or a single copy of a block of data becomes unavailable. What they care about is aggregate throughput for the entire cluster.
  
  --
  09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
12. Re:Btrfs? by cgenman · 2010-01-15 00:50 · Score: 1
  
  It has been a while since I built a Linux system. Can someone comment on the specific advantages of EXT4 over EXT2?
  
  --
  The ______ Agenda
13. Re:Btrfs? by Simetrical · 2010-01-15 05:26 · Score: 1
  
  I guess they didn't consider btrfs ready enough for benchmarking yet.
  Aside from btrfs not being ready for production according to anyone, including its developers, it's probably not useful to Google. It has tons of awesome features, but they mostly make administration easier. Google already administers everything through their own user-space cross-computer filesystem, which can handle all their integrity/backup/live upgrade/etc. requirements much better than btrfs probably could. What they want is raw performance, and when btrfs is ready for prime time, it will probably beat ext4 on some benchmarks (especially if you have, e.g., a "file copy" benchmark and let btrfs use COW) but lose on others.
  
  --
  MediaWiki developer, Total War Center sysadmin
No ReiserFS? by CRCulver · 2010-01-14 09:00 · Score: 3, Interesting

It's interesting that ReiserFS wasn't even an option here. I myself even ended up using Ext4 when I set up a new box not too long ago. It's a real shame that just because the creator of the filesystem committed a crime, people are drawn to treat the technology itself are somehow dishonored.
1. Re:No ReiserFS? by pdbaby · 2010-01-14 09:06 · Score: 3, Insightful
  
  ...or maybe the fact that he's no longer involved brings up questions about its future direction. I'm sure they took a look at reiserfs previously
  
  --
  Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
2. Re:No ReiserFS? by Anonymous Coward · 2010-01-14 09:06 · Score: 4, Funny
  
  ...maybe they felt it wasn't cutting edge enough.
3. Re:No ReiserFS? by Icarium · 2010-01-14 09:12 · Score: 1
  
  I'd imagine contacting a prison for tech support could be a bit awkward.
  (Yes, I know it's lame)
4. Re:No ReiserFS? by icepick72 · 2010-01-14 09:28 · Score: 1
  
  The association is too close in this case because a murderer's name is part of the file system name. If the product had been named something else the association wouldn't be there. Might as well stock the shelves with Bernardo Bath Oil and Dahmer Doodads. How well do you think that would go in the eyes of the corporate world? So it's not because the creator of the filesystem committed a crime, it's because the product has an unsavoury name - those are two distinct and unrelated issues.
5. Re:No ReiserFS? by jspenguin1 · 2010-01-14 09:35 · Score: 5, Funny
  
  They need to change the name... How about
  Object-oriented
  Journalled
  File
  System?
6. Re:No ReiserFS? by KlomDark · 2010-01-14 09:44 · Score: 1
  
  // Came here for the Reiser reference //// Not leaving disappointed! ////// Oops, this aint Fark...
7. Re:No ReiserFS? by gmuslera · 2010-01-14 10:10 · Score: 2, Funny
  
  To make the move to this new filesystem, they hired Ted T'so (actual maintainer of ext4). Hans wasn't available for the moment, and would be bad to have a famous employee that, well, did evil.
8. Re:No ReiserFS? by metamatic · 2010-01-14 11:00 · Score: 1
  
  Hans wasn't available for the moment, and would be bad to have a famous employee that, well, did evil.
  Google hires ex-Microsoft employees all the time.
  
  --
  GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
9. Re:No ReiserFS? by pHus10n · 2010-01-14 11:20 · Score: 2, Funny
  
  I thought ReiserFS would be the "killer app" for Google...
10. Re:No ReiserFS? by mqduck · 2010-01-14 11:51 · Score: 3, Interesting
  
  So it's not because the creator of the filesystem committed a crime, it's because the product has an unsavoury name
  Actually, it's more likely because the creator and main developer of the filesystem is suddenly gone. As I understand it, he wasn't a very friendly guy (surprise!) and drove others away from the project.
  
  --
  Property is theft.
11. Re:No ReiserFS? by TheRaven64 · 2010-01-14 12:26 · Score: 2, Insightful
  
  They've never hired anyone from the Windows ME team though, only people who did the sort of everyday low-grade evil, nothing too heinous.
  
  --
  I am TheRaven on Soylent News
12. Re:No ReiserFS? by Xabraxas · 2010-01-14 14:50 · Score: 1
  
  The problem with ReiserFS is that Reiser3 is old and lacking features compared to other filesystems like XFS and EXT4. Rieser4 isn't a part of the kernel and probably never will be so that could end up being quite problematic, especially in the future.
  
  --
  Time makes more converts than reason
13. Re:No ReiserFS? by anomaly65 · 2010-01-14 17:43 · Score: 1
  
  reiserfs while good only runs in a chroot'ed jailed file system ;-)
14. Re:No ReiserFS? by Joey+Vegetables · 2010-01-15 01:09 · Score: 1
  
  Perhaps if they decide to make heavy use of these, they might reconsider.
  Seriously . . . you want something as important and heavily used as a filesystem to be as future-proof as possible, and there remains serious question about who will maintain reiser4 going forward. Ext4 is a stepping-stone to btrfs, which seems to have a bright future, and incorporates many of the same ideas as reiserfs.
  
  --
  
  Nonaggression works!
15. Re:No ReiserFS? by bluefoxlucid · 2010-01-15 05:38 · Score: 1
  
  Creating ReiserFS was a huge offense and it's appropriate to banish both Hans and the file system itself to the void.
  
  --
  Support my political activism on Patreon.
16. Re:No ReiserFS? by bluefoxlucid · 2010-01-15 05:56 · Score: 1
  
  Yeah... he's always resisted implementing file systems the "right" way. Xattrs? Psh, waste of time. He implemented "Plug-Ins" in Reiser4 and said "Add it yourself." One time that guy offered me a job and I told him to shove his dated, broken-by-design file shitstorm up his ass and take his big empty head elsewhere. Alas... he does not take stress well.
  
  --
  Support my political activism on Patreon.
17. Re:No ReiserFS? by bluefoxlucid · 2010-01-15 05:59 · Score: 1
  
  Rehahahaha... you're not American are you? Prisons aren't for rehabilitation, silly Eurotwit. You throw scum there to rot for an imaginary "fair" term, after which they will immediately commit another petty crime for which you apply maximized and trivially compound sentences to keep them there for eternity. They leave knowing only how to commit further crime and live in a prison, and also blackmarked and not able to get a job.
  
  --
  Support my political activism on Patreon.
Google doesn't need journaling? by Paradigm_Complex · 2010-01-14 09:00 · Score: 3, Interesting

The main advantage of EXT3 over EXT2 is that, with journaling, if you ever need to fsck the data, it goes a LOT quicker. It's interesting to note that Google never felt it needed that functionality.

Additionally, I was under the impression that Google used massive numbers of commodity consumer-grade harddrives, as opposed to high-grade stuff which I presume is less likely to err. Couple this fact with the massive amount of data Google is working with and there has got to be a lot of filesystem errors, no?

Can anyone else with experience with big database stuff hint as to why Google would not need to fsck their data (often enough for EXT3 to be worthwhile)? Is it cheaper just to overwrite the data from some backup elsewhere at this scale? How do they know the backup is clean without fscking that?

--
"A witty saying proves nothing." - Voltaire
1. Re:Google doesn't need journaling? by spydum · 2010-01-14 09:06 · Score: 4, Informative
  
  Replicas stored across multiple servers -- if one is corrupted or unavailable requiring fsck, who cares? Ask the next server in line for the data.
2. Re:Google doesn't need journaling? by 42forty-two42 · 2010-01-14 09:09 · Score: 1
  
  First, google's servers each have their own battery, so it's unlikely that all the servers in a DC will go down at once. If only a few go down, their redundancy means that it's not a big deal - they can wait for the fsck. And moreover, even if an entire DC goes down (eg, due to cooling loss) they have the redundancy needed to deal with entire datacenter failures - with that kind of redundancy, fscking is only a minor inconvenience (plus with a cooling failure they might have time to sync and umount before poweroff...)
3. Re:Google doesn't need journaling? by ls671 · 2010-01-14 09:17 · Score: 1
  
  I always felt that fscking the data taking data that is already on the disk (the journal) into account was weaker than fscking the data independently (no journal). Or at least that it would bring more possibilities of errors (e.g. errors in the journal itself). It may very well be an unjustified impression that I have but at least it seems logical at first glance; A simpler file system means less risk of bugs, etc.
  http://slashdot.org/comments.pl?sid=1511104&cid=30770742
  
  --
  Everything I write is lies, read between the lines.
4. Re:Google doesn't need journaling? by amRadioHed · 2010-01-14 09:32 · Score: 2, Informative
  
  If you lost power while the journal was being written and it was incomplete then the journal entry would just be discarded and your filesystem itself would be fine, it would just be missing the changes from the last operation before the crash.
  
  --
  We hope your rules and wisdom choke you / Now we are one in everlasting peace
5. Re:Google doesn't need journaling? by FlyingBishop · 2010-01-14 09:37 · Score: 1
  
  It's always rather curious to me when people re-state the last question in a post as a sentence when a simple 'yes' would have sufficed.
6. Re:Google doesn't need journaling? by Anonymous Coward · 2010-01-14 09:45 · Score: 1, Funny
  
  It is always rather curious to me when people use the phrase "it's always rather curious to me" when a simple "I hate it when" would have sufficed.
7. Re:Google doesn't need journaling? by crazyvas · 2010-01-14 09:50 · Score: 2, Informative
  
  They use fast replication techniques to restore disk servers (chunkservers in GFS terminology) when they fail.
  The failure could be because of a component failure, disk corruption, or even a simply killing of the process. The detection is done via checksumming (as opposed to fscking), which also takes care of detecting higher-level issues that fscking might miss.
  Yes, it is much cheaper for them to overwrite data from another replica (3 replicas for all chunkservers is the default) using their fast re-replication techniques rather than trying to fsck.
  Check this paper out (see pdf link at bottom of page) under "Section 5: Fault Tolerance and Diagnosis" for more info:
  http://labs.google.com/papers/gfs.html
8. Re:Google doesn't need journaling? by the_other_chewey · 2010-01-14 10:10 · Score: 1
  
  The main advantage of EXT3 over EXT2 is that, with journaling, if you ever need to fsck the data, it goes a LOT quicker. It's interesting to note that Google never felt it needed that functionality.
  Doing fsck runs is just not worth it for them. One of the first contributions from google to the ext4
  driver was the possibility to run ext4 volumes without journaling: All the performance benefits of ext4,
  none of the performance penalties of journaling.
  
  If there is (possible) FS corruption, they just rebuild it from scratch from another copy of the data.
  
  This comes from Ted T'so's FOSDEM09 keynote BTW. Very interesting talk.
9. Re:Google doesn't need journaling? by adolf · 2010-01-14 11:16 · Score: 1, Insightful
  
  It's always a rather curious occurrence to me when, in times when one is complaining about specific instances of text which are lacking brevity, the prose that the complainant themselves produce uses "it is" instead of "it's."
  
  --
  Kid-proof tablet..
10. Re:Google doesn't need journaling? by tytso · 2010-01-14 11:55 · Score: 4, Interesting
  
  So there's a major problem with Soft Updates, which is that you can't be sure that data has hit the disk platter and is on stable store unless you issue a barrier operation, which is very slow. What Soft Updates apparently does is assume that once the data is sent to the disk, it is safely on the disk. But that's not a true assumption! The disk drive, especially modern ones with large caches, can reorder writes which are sent to the disk, sometimes (with the right pathological workloads) for minutes at a time. You won't notice this problem if you just crash the kernel, or even if you hit the reset button. But if you pull the plug or otherwise cause the system to drop power, data in the disk's write cache won't necessarily be written to disk. The problem that we saw with journal checksums and ext4 only showed up on a power drop, because there was a missing barrier operation, so this is not a hypothetical consideration.
  In addition, if you have a very heavy write workload, the Soft Updates code will need to burn a fairly large amount of memory tracking the dependencies and burn quite a bit of CPU figuring out which dependencies need to be rolled back. I'm a bit suspicious of how well they perform and how much CPU they steal from applications --- which granted, may not show up in benchmarks which are disk bound. But if the applications or the large number of jobs running on a shared machine are trying to use lots of CPU as well as disk bandwidth, this could very much be an issue.
  BTW, while I was doing some quick research for this reply. it seems that NetBSD is about to drop Soft Updates in favor of a physical block journaling technology (WAPBL), according to Wikipedia. They didn't get a reference to this, nor did they say why NetBSD was planning on dropping Soft Updates, but there is a description of the replacement technology here: http://www.wasabisystems.com/technology/wjfs. But if Soft Updates is so great, why is NetBSD replacing it and why did Free BSD add file system journaling alternative to UFS?
11. Re:Google doesn't need journaling? by TheRaven64 · 2010-01-14 12:40 · Score: 1
  
  I'm not sure how you think fsck or journaling work...
  With a tool like fsck, it starts at the root inode of a filesystem and then walks the tree, looking for various things that can be caused by writes happening in the wrong order. For example, it does garbage collection so that inodes that have a reference count greater than 0 but which are not actually referenced in a directory entry are removed (or moved to a folder where you can check if they are parts of a file that you didn't meant to delete). It will also check that the amount of free space and the size of the disk minus the size of the files it can find are the same.
  With journaling, this is simpler because you have a much smaller number of things that can go wrong. With a journaling FS, you first write to disk that you are going to make a change, then you make the change, then you erase that bit of the journal. If the power fails before you write to the journal, you lose the transaction. If it fails after you write the journal, then you may be able to replay the transaction from the journal (if it's something simple). If it fails after, then you look in the journal, see it's already been done by checking the on-disk state, and delete the journal.
  As a simple example, consider moving a file from one directory to another. You need to add an entry to the target directory, then you need to remove it from the old one. In the middle, the file will be referenced in two places but its reference count will still be one, so unlinking it in the old directory will delete it in both places and leave a dangling reference in the other. If the power fails at this point, fsck will walk the directory tree, find two references to the same inode, and either delete one of them or increment the reference count of the file and report an error (this behaviour is implementation dependent, your fsck may do something completely different, this is just an example).
  With journaling, you first write something in the log saying which file you are moving. Then you update the target directory, then you update the source directory, then you update the journal again to say that you've done it. Now this time when power goes out in the middle, fsck can look at the journal and immediately see the two directories that are in an inconsistent state. First it will check the target directory, and if the file isn't referenced there then it will add a reference. Then it will check the source directory and remove the entry there, if it exists. Then it will delete the journal entry. At every point in the initial operation, there was enough information on disk to complete the operation entirely. Without the journal, fsck could only find a bit of the filesystem that was inconsistent; it still needed to employ heuristics to guess what the correct state should have been.
  The fsck tool isn't magic. It knows a bit about what the filesystem is meant to look like, and tries to ensure that it really does look like that, but it doesn't always have enough information to get things right.
  
  --
  I am TheRaven on Soylent News
12. Re:Google doesn't need journaling? by Anonymous Coward · 2010-01-14 13:10 · Score: 2, Interesting
  
  Well, the performance is not that easy to compare in pure theory. SU will often require less writes than journaling. But SU requires that complex dependency tracking.
  About the barriers. Is it really that different from journaling file systems? If the disk drive can change the order of the operations that surely has an impact on journaling file systems. The journal would be quite useless when the transaction it represents is commited before it is logged in the journal. That way the operation could be commited half way and there is no journal entry to roll it back or complete it. Maybe I am wrong but you would need a barrier for every operation with the journal.
  About the BSDs:
  I found two reasons for NetBSD switching to WAPBL. Their implementation of soft updates (called softdeps) seems to be buggy in some corner cases. Journaling is less complex and easier to get right, while having similar performance characteristics. They often cite performance statistics where WAPBL wins by about 10%-15%. But that is not very solid, it only covers one usage pattern. The research I know of usually shows that in general journaling and soft updates are very similar with each one winning in some patterns. I think using the simpler solution is really the right choice for a project like NetBSD.
  Journaling in FreeBSD is another quite interesting story. Journaling for FreeBSD is implemented in GEOM. My knowledge here is really limited, but GEOM acts below the file system. So the implementation in GEOM could provide journaling for every file system, like GEOM can provide encryption for every file system. AFAIK journaling in GEOM provides hooks that are used by UFS. I don't know why but my guess is performance improvments.
  Additionally there is this quite new UFS SU+J implementation. That is UFS with soft updates and a journal to keep track of the freed space.
  What I really am ranting about is that for Linux this hasn't even been tried. Allthough there are loads of Linux file systems there isn't much innovation going on. Really the only reason I found was that soft updates is complex. At least BtrFS comes with copy-on-write.
13. Re:Google doesn't need journaling? by ls671 · 2010-01-14 13:43 · Score: 1
  
  I am very familiar with transactional behavior with regards to disk writes and journals. I have learned this many years ago while studying how database journaling works.
  I also had to manually answer e2fsck question on system crash and evaluate the damage afterward in our early days so I understand what you are saying. We don't have to do this anymore, We just restore from a consistent image.
  Most of our critical (if nott all) data is in the database, which is failover and redundant.
  I guess I was trying to say that several layers of journaling was useless in our use case.
  Maybe it is the same for Google. There will come a day when we will upgrade too, but the main cause might be lack of updates for ext2, not the need for a journaling file system.
  Journaling file systems are great for laptops and desktops. Then again, there has been talks for a while about merging the database functionality into the file system. Obviously we would then need a journaling file system. Until we stop using relational databases to replace it with functionality implemented at the file system level, it always seemed to me that a journaling file system was duplication of functionality that wasn't required for our use case.
  The "wait until stable before jumping into the band wagon" principle was also applied in our case. Maybe it was the same for Google. Nevertheless, Google's move would be a good indication that ext4 file system drivers are ready for a try ! ;-))
  
  --
  Everything I write is lies, read between the lines.
14. Re:Google doesn't need journaling? by D+Ninja · 2010-01-14 14:44 · Score: 1
  
  if you ever need to fsck the data
  My my! The things they're doing with porn these days!
15. Re:Google doesn't need journaling? by tytso · 2010-01-14 17:38 · Score: 2, Informative
  
  What Soft Updates apparently does is assume that once the data is sent to the disk, it is safely on the disk. But that's not a true assumption!
  Journaling, and every other filesystem, has exactly the same problem. If consistence is required, YOU MUST DISABLE THE CACHE, unless it is battery-backed, or you are willing to depend on your UPS. This is the penalty we take for devices which lie to the OS about flush operations and the like.
  Yes, there were, in the bad old days, devices which lied when the OS sent a flush cache command, and in order to get a better Winbench score, they would cheat and not actually flush the cache. But that hasn't been true for quite a while, even for commodity desktop/laptop drives. It's quite easy to test; you just time how many single block sector writes followed by a cache flush commands you can send per second. In practice, it won't be more than, oh, 50-60 write barriers per second. In general, if you use a reputable disk drive, it supports real cache flush commands. My personal favorites are Seagate momentus drives for laptops, and I can testify to the fact that they all handle cache flush commands correctly; I have quite a collection and it's really not hard to test.
  The big difference between journalling and soft updates is we can batch potentially hundreds of metadata updates into a single journal transaction, and send down a single write barrier every few seconds. The journal commit is an all-or-nothing sort of thing, but that gives us reliability _and_ performance.
  The problem with soft updates is that the relative ordering of nearly most (if not all) metadata writes are important. And putting a write barrier between each barrier operation is Slow And Painful. Yes, you can disable the write cache, but then you give up a huge amount of performance as a result. With journaling we can get the performance benefits of writes, but we only have to pay the cost of enforcing write ordering through the barrier once every few seconds.
  Of course, there are workloads where soft updates plus a disabled write cache might be superior. If you have a very metadata-intensive workload that also happens to call fsync() between nearly every metadata operation, then it would probably do better than a physical block journalling solution that used barrier writes but run with an enabled write cache. But in the general case, if you compare a more normal workload where fsync()'s aren't happening _that_ often, and compare physical block journalling with a write cache and barrier ops, with a Soft Updates approach with the write cache disabled, I'm pretty sure the physical block journalling approach will end up benchmarking better.
16. Re:Google doesn't need journaling? by tytso · 2010-01-14 17:52 · Score: 2, Interesting
  
  So I'm an engineer, and not an academic. I'm not trying to get a Ph.D. The whole Keep it Simple, Stupid principle is an important one, especially as you say, "Journalling and Soft Updates have similar performance characteristics."
  If sometimes Journalling posts better benchmarks, and sometimes Soft Updates produces better results, but Soft Updates is hideously more complex, thus inhibiting new features such as ACL's and Extended Attributes (which appeared in BSD much latter than Linux, and I think Soft Updates made it much harder to find people capable of extending the file system) --- then the choice of the simpler technology seems to be obvious. The performance gains are a toss up, and using a hideously complex algorithm for its own sake is only good if you are an academic gunning for a Ph.D. thesis or a paper publication, or if you are trying to ensure job security by implementing something so hard to maintain that only you and few other people can hack it.
17. Re:Google doesn't need journaling? by butlerm · 2010-01-14 18:20 · Score: 1
  
  Journaling, and every other filesystem, has exactly the same problem. If consistence is required, YOU MUST DISABLE THE CACHE
  Not true - it depends on the filesystem. It is true with journaling filesystems that either a write barrier or a cache flush (that are actually honored by the disk subsystem) must be issued after every journal commit to maintain metadata consistency in the event of a power loss.
  The only disadvantage of that is that it really slows down a strict implementation of fsync, unless you have a battery backed RAID controller or place the journal on some sort of fast persistent storage like an SSD. The real question is how long will it be before spinning hard drives have a little bit of flash inside them so that they can internally journal all disk writes and honor "cache flush" commands with the alacrity of (much smaller and more expensive) SSDs.
18. Re:Google doesn't need journaling? by Jeff- · 2010-01-14 19:31 · Score: 5, Informative
  
  There's a lot of misinformation in this thread about softupdates. I only have so much time to reply so I'll hit a few key points. I'm the author of journaling extensions to softupdates so I have some experience in this area.
  This notion that softupdates was so complex and so inhibited new features in ffs is bogus. I've seen it repeated a few times. There simply was not much pressure for these features and the filesystem metadata did not support it until ufs2. The total amount of code dedicated to extended attributes in softupdates can't be more than 100 lines. ffs sees fewer features because we have fewer developers period.
  Furthermore, softupdates is just a different approach. It is no more complex than journaling. When I review a sophisticated journaling implementation such as xfs I see more lines of code dedicated to journaling and transaction management than softupdates requires for dependency tracking. I have worked on a number of production filesystems and while softdep is definitely not trivial, neither were any of the others unless you compare to synchronous ufs. I think a lot of people who are familiar with COW and Journaling are looking at this unfairly because they already know another system and forget how long it took to become comfortable with it.
  In cpu benchmarks softdep costs more than async ffs, this is true. However, rollbacks are actually quite infrequent because our buffercache attempts to write buffers without dependencies first. Generally there are enough of those which satisfy dependencies on other buffers that you can keep the pipeline busy. Looking at the code size and depth in any modern filesystem it's clear that a lot of cpu is involved. Are journal blocks not consuming memory? Is the transaction tracking free? Most dependency structures are quite small compared to generating a copy of a metadata block for a jouranl write.
  NetBSD abandoned softdep for something much simpler because they didn't have the resources to fix the bugs in it and they didn't incorporate fixes from FreeBSD. Their journaling implementation is similar to our gjournal which is mostly filesystem agnostic and does full block logging in a very simple fashion.
  The journaled filesystem project was started simply to get rid of fsck. I think this hybrid solution is very promising. It gives us a place to issue barriers which can affect arbitrary numbers of filesystem operations. The journal write overhead is much lower than with traditional journals.
  And regarding benchmarks; FreeBSD doesn't really have a comparably developed journaling filesystem to benchmark softdep against. I think it's unreasonable to compare linux with ext4 to FreeBSD with ffs+softdep for purposes of evaluating the filesystem design. Too many other factors come into play.
  You can read more about softdep journaling at http://jeffr_tech.livejournal.com/
  Thanks,
  Jeff
19. Re:Google doesn't need journaling? by joib · 2010-01-14 21:12 · Score: 1
  
  BTW, while I was doing some quick research for this reply. it seems that NetBSD is about to drop Soft Updates in favor of a physical block journaling technology (WAPBL), according to Wikipedia. They didn't get a reference to this, nor did they say why NetBSD was planning on dropping Soft Updates, but there is a description of the replacement technology here: http://www.wasabisystems.com/technology/wjfs. But if Soft Updates is so great, why is NetBSD replacing it
  While I'm a Linux user myself, I also happened to stumble upon this while surfing a few months ago. IIRC based on some blog and mailing list posts I read at the time, the fundamental problem with soft updates on NetBSD (or soft dependencies, softdeps as they call it) was that they could never get the code stable enough for production usage.
  Or to put it another way, soft updates work on FreeBSD because McKusick himself maintains the code. :)
20. Re:Google doesn't need journaling? by DrXym · 2010-01-14 21:34 · Score: 1
  
  The main advantage of EXT3 over EXT2 is that, with journaling, if you ever need to fsck the data, it goes a LOT quicker. It's interesting to note that Google never felt it needed that functionality.
  I wouldn't be surprised if most of the data is transient, so why bother to recover it? If necessary, reimage the base OS - the transient stuff is going to get overwritten anyway.
21. Re:Google doesn't need journaling? by bingoUV · 2010-01-15 01:43 · Score: 1
  
  There will come a day when we will upgrade too, but the main cause might be lack of updates for ext2, not the need for a journaling file system.
  Why do you need updates for ext2 if it is already working fine for you? Keep using it forever. It is not that without updates working code would stop working, is it? Or you enjoy rebooting after updating the system?
  
  --
  Bingo Dictionary - Pragmatist, n. A myopic idealist.
22. Re:Google doesn't need journaling? by evilviper · 2010-01-15 03:14 · Score: 1
  
  Yes, you can disable the write cache, but then you give up a huge amount of performance as a result. With journaling we can get the performance benefits of writes, but we only have to pay the cost of enforcing write ordering through the barrier once every few seconds.
  While I will acknowledge that I'm not in a position to argue with you, I must point out that I've head from SEVERAL (in fact, ALL who've made a statement on the subject up until now) file system writers and experts that, even with journaling, the write cache MUST be disabled to guarantee file system consistency. What do you know that they don't?
  For reference, it does appear those behind XFS disagree with you: http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F
  I'm sure, give time, I could find many others...
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
23. Re:Google doesn't need journaling? by tytso · 2010-01-15 03:20 · Score: 1
  
  Jeff,
  You may be correct in saying that if you compare the guts of Soft Updates with that of (say) the JBD/JBD2 layer in Linux, which is what is responsible for handling the physical block journalling for ext3/ext4, the complexities involved might not be that different.
  However, the difference comes when someone adds ACL support, or some other fs feature. When you are using physical block journalling, all you need to know is how many blocks a particular fs operation needs to dirty. That's it! With Soft Updates, you need to understand dependency diagrams and write code to implement rollbacks, etc. The person who is implementing the file system feature has to do many more things.
  Now there are certainly downsides to doing physical block journalling. If you have workloads which are very high in metadata operations, physical block journalling will hurt. On the other hand, it's not clear how common such workloads are (although you can certainly find benchmarks that will stress that particular usage pattern). And in the face of hard drive errors, physical block journals can sometimes be better at recovering from certain failures than logical journalling or soft updates.
  Like many things, there are always tradeoffs around, and if the goal is to play the "my file system has a longer d*ck" game, it's almost always possible to find some benchmark which "proves" that one file system is better than another. Yawn...
24. Re:Google doesn't need journaling? by tytso · 2010-01-15 04:31 · Score: 1
  
  Read the answer to the FAQ very carefully. In fact, they agree with me:
  
  With a single hard disk and barriers turned on (on=default), the drive write cache is flushed before and after a barrier is issued. A powerfail "only" loses data in the cache but no essential ordering is violated, and corruption will not occur.
  In certain cases it might make sense to turn off barriers and disable write caches, if you are writing huge amounts of bulk data and very little metadata in a RAID array --- and that is what XFS is optimized for. But they didn't say anything which contradicted what I said, although the conclusions might have been a little confusing and not necessarily applicable in workloads other than XFS's original design point of really big RAID arrays to support writing really big data sets.
25. Re:Google doesn't need journaling? by evilviper · 2010-01-15 07:18 · Score: 1
  
  In fact, they agree with me
  Only on single-drives (which I really don't care about, a little I/O performance either way isn't even notable on a PC). On RAID arrays, they specifically say: "you have a very high chance of big data losses on a power outage."
  There's a big difference between might perform better in workload X, which you claim is their reasoning, and "big data losses", which is what they actually, literally say.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
26. Re:Google doesn't need journaling? by Jeff- · 2010-01-15 11:48 · Score: 1
  
  "Like many things, there are always tradeoffs around, and if the goal is to play the "my file system has a longer d*ck" game, it's almost always possible to find some benchmark which "proves" that one file system is better than another. Yawn..."
  Really Ted, where did I mention that softdep was better? This is a bit inappropriate. You seem keen on convincing everyone that softdep is so terrible for what reason I can't imagine. I'm not knocking your work. I've read your blog a bit, you're doing some great stuff. I'm just trying to clear up misconceptions.
512 MB size limit (bug) gone? by Gothmolly · 2010-01-14 09:03 · Score: 1

Did they fix that nasty "if you have files > 512MB kiss them goodbye" bug ?

--
I want to delete my account but Slashdot doesn't allow it.
1. Re:512 MB size limit (bug) gone? by physburn · 2010-01-15 03:55 · Score: 1
  
  Thats on ubuntu, might not be elsewhere. No its isn't fixed on the current ubuntu. So don't use ext4 on ubuntu servers. Your .cpio or whatever other giga plus files might disappear.
  ---
  Data Integrity Feed @ Feed Distiller
As impressively as each other?! WTF?! by Anonymous Coward · 2010-01-14 09:04 · Score: 4, Funny

From TFA:

In their benchmarking, EXT4 and XFS performed, as impressively as each other.
WTF kind of retarded sentence is that?! Did Rob Smith help you write that article?!
In their benchmarking of EXT4 and XFS, EACH performed as impressively as THE OTHER.
1. Re:As impressively as each other?! WTF?! by fm6 · 2010-01-14 10:00 · Score: 1
  
  I think you meant to say, "Well a monster that gigantic could only be defeated by an even equally gigantic monster!"
2. Re:As impressively as each other?! WTF?! by Itninja · 2010-01-14 10:04 · Score: 1
  
  Sorry but my internal lameness filter automatically ignores any sentence beginning with 'WTF'.
  
  --
  I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
3. Re:As impressively as each other?! WTF?! by mqduck · 2010-01-14 11:55 · Score: 3, Informative
  
  Simply removing the second comma would make the sentence entirely correct:
  "In their benchmarking, EXT4 and XFS performed as impressively as each other."
  Adding "each" would make it a bit clearer, but the meaning is already obvious. I don't know why you think it has to be "THE other".
  
  --
  Property is theft.
4. Re:As impressively as each other?! WTF?! by inKubus · 2010-01-14 20:18 · Score: 1
  
  Only if you're a passive-voiced idiot. "Journalists", use the active voice! Please!
  Correct: "EXT4 and XFS were equally impressive in benchmarking."
  Note that there are 8 words versus 12 yet somehow the idea is communicated more fully. This is because the reader does not have to pause to try to remember what the subject was (benchmarking or EXT4 and XFS or impressions).
  
  --
  Cool! Amazing Toys.
5. Re:As impressively as each other?! WTF?! by osu-neko · 2010-01-14 23:09 · Score: 1
  
  Only if you're a passive-voiced idiot. "Journalists", use the active voice! Please!
  Your point is taken.
  
  --
  "Convictions are more dangerous enemies of truth than lies."
6. Re:As impressively as each other?! WTF?! by ruukusama · 2010-01-15 03:58 · Score: 1
  
  You don't know what the passive voice is. "In their benchmarking, EXT4 and XFS performed as impressively as each other" is in the active voice.
Still on ext2 on servers by ls671 · 2010-01-14 09:05 · Score: 3, Insightful

We are still using ext2 on servers. Now I have an argument; if Google is still using ext2 maybe we aren't so foolish. We might update some day but it is not yet a priority. With UPS and proper fail over and backup procedure in place, I can't remember when a jounaling file system would have helped us in any way. They seem great for desktops/laptops although.

--
Everything I write is lies, read between the lines.
1. Re:Still on ext2 on servers by Bill,+Shooter+of+Bul · 2010-01-14 09:33 · Score: 1
  
  Seriously? Being able to recover you data faster, isn't a consideration? Or do you have a big SAN for all of the critical application data?
  
  --
  Well.. maybe. Or Maybe not. But Definitely not sort of.
It's Not Hans by TheNinjaroach · 2010-01-14 09:06 · Score: 4, Interesting

I too have abandoned using ReiserFS but it's not about the horrible crime Hans committed. It's about the fact I don't think the company that he owned (who developed ReiserFS) has a great future, so I foresee maintenance problems with that filesystem. Sure, somebody else can continue their work but I'm not going to hold my breath.

--
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
1. Re:It's Not Hans by slimjim8094 · 2010-01-14 09:28 · Score: 1
  
  So it's indirectly about the horrible crime Hans committed. Since it's because of that that his company has a poor future, and won't be maintaining Reiser for very long.
  
  --
  I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
2. Re:It's Not Hans by Enderandrew · 2010-01-14 09:37 · Score: 1
  
  ReiserFS is in mainline, and is maintained by the kernel developers. Resier and Namesys all but abandoned it, which is one of many factors that kept the newer Reiser4 out of mainline, even though Reiser4 was superior to ReiserFS in many ways.
  
  --
  http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
3. Re:It's Not Hans by Rich0 · 2010-01-14 09:53 · Score: 1
  
  ReiserFS is in mainline, and is maintained by the kernel developers.
  So is OS/2 HPFS. On the one hand that shows that ReiserFS will probably supported almost forever. On the other hand, I'm not sure I'd be rolling it out for new deployments or applications unless you're in a very tight niche.
4. Re:It's Not Hans by Anonymous Coward · 2010-01-14 10:05 · Score: 1, Interesting
  
  If Google had found that it gave some badass speeds, they probably would have just picked up maintenance themselves.
5. Re:It's Not Hans by Lennie · 2010-01-14 10:12 · Score: 1
  
  Actually people are still working on getting it in mainline and they are making progress, although slowly if I'm not mistaken.
  
  --
  New things are always on the horizon
6. Re:It's Not Hans by diegocg · 2010-01-14 10:15 · Score: 4, Informative
  
  Reiserfs has been undermaintained for a lot of time AFAIK. When hans started working in reiser4, he forgot completely about adding needed features to v3. The reiserfs disk format may be good, but the codebase is outdated. Ext4 has an ancient disk format in many ways, but the codebase is scalable, it uses delayed allocation, the block allocator is solid, xattrs are fast, etc etc. Reiserfs still uses the BKL, the xattr support that Suse added is said to be slow and not very pretty, it had problems with error handling etc etc...
7. Re:It's Not Hans by sproingie · 2010-01-14 10:45 · Score: 1
  
  I'm frankly kind of surprised anyone was still running ReiserFS before he was in the news: its failure modes are spectacular, partition-eating things, the recovery tool never worked, and even recoverable errors take eons to fsck.
8. Re:It's Not Hans by dmomo · 2010-01-14 10:53 · Score: 1
  
  KillerFS may be more subtle!
9. Re:It's Not Hans by mqduck · 2010-01-14 11:46 · Score: 2, Interesting
  
  Personally, I think Hans should have been allowed to continue his work on ReiserFS while incarcerated. Better to let a guilty man contribute to society than do nothing but rot in prison, no?
  
  --
  Property is theft.
10. Re:It's Not Hans by Shimbo · 2010-01-14 13:23 · Score: 1
  
  I'm frankly kind of surprised anyone was still running ReiserFS before he was in the news.
  It was the default on SuSE at the time (barely), so you shouldn't be. Filesystem failure modes are always going to be fairly anecdotal*; I've never seen a really screwed filesystem, except where the drive was terminally ill.
  *Unless you're Google.
11. Re:It's Not Hans by rwa2 · 2010-01-14 15:13 · Score: 2, Informative
  
  I'm still running reiser3, and probably holding out for reiser4... it's been confusing since the benchmarks for the next-gen fs's have been all over the place, but some look promising:
  http://www.debian-administration.org/articles/388#comment_127
  I've always run software RAIDs to crank out a bit more performance out of the slowest part of my system, and reiserfs3 has always worked better out of the box. I'd spent long hours tuning EXT3 stripe widths and directory indexes and stuff, and EXT3 always came out slower and more wasteful of space.
  Here's a handful of numbers from bonnie++ from my 4-disk raid10:
  EXT3fs: 4G 246 97% 61403 29% 39928 11% 1512 95% 166253 24% 525.3 10% Latency 87699us 4739ms 644ms 54683us 69023us 302ms
  Reiser3: 4G 264 97% 65732 31% 44530 15% 1447 95% 164567 34% 557.9 18% Latency 33368us 4201ms 4061ms 21967us 134ms 118ms
12. Re:It's Not Hans by jobst · 2010-01-14 16:42 · Score: 1
  
  that COULD mean its a REALLY GOOD fs.
  
  --
  to code or not to code, that is the question.
13. Re:It's Not Hans by Anonymous Coward · 2010-01-14 21:03 · Score: 1, Funny
  
  I think you're a nice person to say that.
  By virtue of this, I ignore your having an AOL IM account. :)
14. Re:It's Not Hans by vegiVamp · 2010-01-14 23:15 · Score: 1
  
  I generally agree with the useful contribution thing, but I assume that he *likes* working on his filesystem, and being able to do so with even less interruptions from outside life may not be the punishment it should be.
  
  --
  What a depressingly stupid machine.
15. Re:It's Not Hans by Ash-Fox · 2010-01-14 23:49 · Score: 1
  
  Tell that to the murdered woman's family.
  Sure. I have no problem doing that. How do I go about doing that?
  
  --
  Change is certain; progress is not obligatory.
16. Re:It's Not Hans by Zebedeu · 2010-01-15 00:18 · Score: 1
  
  Why should the family have a say in this? They have an obvious emotional investment in the case which precludes rational thought.
  Continuing with your line of thought, why not let the family decide what the penalty for the crime should be? I'm guessing they think prison is too light a sentence for him.
17. Re:It's Not Hans by Enderandrew · 2010-01-15 04:04 · Score: 1
  
  I liked Reiser4 and ran it for a few years. If they continue to improve it, and get it in mainline, I might use it again.
  
  --
  http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
18. Re:It's Not Hans by sproingie · 2010-01-15 06:20 · Score: 1
  
  Ah I did forget about that default. I do know about the quality of anecdotes (it's why "support horror stories" never move my buying decisions much) but the run time of a full fsck is a well-known problem.
  Maybe Reiser4 fixed all that, but now it does go back to the politics of picking a fs with no stable developer base.
19. Re:It's Not Hans by RichiH · 2010-01-16 05:25 · Score: 1
  
  Sure, let him access a computer about which no one of the staff has any clue about freely. Add an UMTS USB dongle and you have the perfect communications gateway for all inmates. I am not saying that they don't have the off phone stashed away anyway, but let's not make things too easy ;)
  I do basically agree that it is always better to give inmates something to do, preferably something they enjoy and can use in the real world once they are out. But I am not convinced this specific idea is good.
20. Re:It's Not Hans by mqduck · 2010-01-16 23:03 · Score: 1
  
  As another poster said, the victim's family shouldn't have a say in the murderer's punishment. They're hardly an unbiased party.
  But I'm not even sure why that's relevant. Why *wouldn't* the family agree with what I said?
  
  --
  Property is theft.
21. Re:It's Not Hans by mqduck · 2010-01-16 23:07 · Score: 1
  
  Unless you're saying that being able to spend your time developing a filesystem is such an attractive alternative to freedom that people contemplating murder don't find it to be a real deterrent, I don't see why his finding his work on ReiserFS enjoyable is a problem.
  
  --
  Property is theft.
22. Re:It's Not Hans by mqduck · 2010-01-16 23:08 · Score: 1
  
  Are we really so barbaric that conditions provided to criminals are unacceptable if they're not miserable enough?
  
  --
  Property is theft.
23. Re:It's Not Hans by mqduck · 2010-01-16 23:18 · Score: 1
  
  I think you're a nice person to say that.
  I wasn't trying to be nice. In fact, what I said is arguably at least as selfish as anything else.
  
  By virtue of this, I ignore your having an AOL IM account. :)
  I can't figure out what the joke is, here. Most likely, that makes me even more worthy of whatever the mockery is. :-P But anyway, people of my generation in my part of the world almost all use AIM primarily. So that answers that. I have an ICQ account that I log onto only for the pure sake of it; I haven't used it in about a decade.
  
  --
  Property is theft.
24. Re:It's Not Hans by RichiH · 2010-01-17 00:16 · Score: 1
  
  No. And growing up in Germany, I subscribe to the "lock them away so they can not harm the rest of society" and not the North American "lock them away so they suffer for what they did".
  I also strongly support re-socialization, especially for younger delinquents as experience has shown again and again that these programs _work_.
  That being said, I assume a murderer is in a high-security prison which means that inmates should probably not have unlimited communication with the outside world so as to break existing command structures in organized crime. And you can be certain that anyone who has this kind of access will be 'approached' by his inmates in a matter of hours or days.
25. Re:It's Not Hans by vegiVamp · 2010-01-18 00:06 · Score: 1
  
  Hackers. Basements.
  
  Yes, it's a stereotype, and therefore not always true, but I imagine the kind of programmer that sits down and starts hacking together a filesystem that's years ahead of most things that are available at the time, to also be the kind of programmer that doesn't really need a lot of sunlight to thrive.
  
  I don't even remotely know Hans Reiser, but he somehow strikes me as the type. Why would he care about perceived notions of freedom - which he'll get in due time anyway - if he can do excactly what it is he loves to do until that time comes? Doing that would probably even count towards 'good behaviour' and maybe get him our early.
  
  --
  What a depressingly stupid machine.
XFS performance highly variable by bzipitidoo · 2010-01-14 09:14 · Score: 3, Interesting

I've used XFS on a RAID1 setup with SATA drives, and found the performance of the delete operation extremely dependent on how the partition was formatted.
I saw times of up to 5 minutes to delete a Linux kernel source tree on a partition that was formatted XFS with the defaults. Have to use something like sunit=64, swidth=64, and even then it takes 5 seconds to rm -rf /usr/src/linux. I've heard that SAS drives wouldn't exhibit this slowness. Under Reiserfs on the same system, the delete took 1 second. Anyway, XFS is notorious for slow delete operations.

--
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
1. Re:XFS performance highly variable by ShadowRangerRIT · 2010-01-14 09:47 · Score: 1
  
  For a lot of modern corporate data storage situations, deletion isn't really important. My company uses an in-house write-once file system (no idea what it's based on), because by and large, the cost of storing old data is negligible next to the advantages of being able to view an older version of the dataset, completely remove fragmentation from the picture, etc. I suspect deletion operations are fairly uncommon at Google; in the rare cases it is necessary it is quite possible they just copy the data they want to keep to a new location, then flash the drive completely.
  
  --
  $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
2. Re:XFS performance highly variable by Anonymous Coward · 2010-01-14 10:00 · Score: 2, Interesting
  
  mounting with nobarrier will change those 5 minutes to 5 seconds, but don't turn off your computer during the delete then.
3. Re:XFS performance highly variable by hvm2hvm · 2010-01-14 10:09 · Score: 1
  
  When did Google delete any information they had on their drives? :P
  
  --
  ics
4. Re:XFS performance highly variable by Lennie · 2010-01-14 10:17 · Score: 1
  
  This doesn't matter to Google. Google wants to keep as much data as possible. ;-)
  
  --
  New things are always on the horizon
5. Re:XFS performance highly variable by Seq · 2010-01-14 18:04 · Score: 1
  
  This may not have come up in testing as Google never actually deletes anything.
  
  --
  -- Seq
6. Re:XFS performance highly variable by fusiongyro · 2010-01-14 18:57 · Score: 1
  
  I'm under the impression SGI traded delete expense for other benefits, like faster writes and more reliable streaming data performance. This kind of tradeoff is pretty common in data structures; AVL trees, for example, have a more costly insertion in exchange for better guarantees about access than red/black trees. Presumably the same kinds of tradeoffs exist for disk data structures as memory.
  I've had my disk eaten by XFS, ReiserFS and ext2, but it's happened less often with XFS. I vaguely remember ReiserFS bugs looking a lot more psychedelic though (halfway through one corrupted file, finding the middle of another corrupted file's content, etc.) whereas with XFS, the files were simply gone. I haven't tried an ext filesystem in a long time but I recall them being recommended for servers due to their stability, not doing anything terribly fancy.
7. Re:XFS performance highly variable by vegiVamp · 2010-01-14 23:28 · Score: 1
  
  Slow delete is a benefit if you accidentally typed 'rm -rf /'.
  
  --
  What a depressingly stupid machine.
GFS by jonpublic · 2010-01-14 09:16 · Score: 3, Insightful

I thought google had their own file system named the google files system.
http://labs.google.com/papers/gfs.html
1. Re:GFS by jonpublic · 2010-01-14 09:21 · Score: 1, Insightful
  
  I should probably read my own posts before hitting submit.
2. Re:GFS by FlyingBishop · 2010-01-14 09:38 · Score: 1
  
  Meh, I always do.
3. Re:GFS by FlyingBishop · 2010-01-14 09:47 · Score: 2, Funny
  
  I meant never.
4. Re:GFS by dotgain · 2010-01-14 10:15 · Score: 2, Funny
  
  Understandable - the keys are right next to each other.
5. Re:GFS by joib · 2010-01-14 10:26 · Score: 3, Informative
  
  I believe GFS uses a local fs on each node to take care of, well, all the stuff that a normal local fs like ext3 does. GFS only does the distributed stuff on top of that.
6. Re:GFS by PPH · 2010-01-14 15:25 · Score: 1
  
  GooFS?
  
  --
  Have gnu, will travel.
Windows Driver by pgn674 · 2010-01-14 09:17 · Score: 1

Might this prompt someone at Google to make an installable file system driver for Windows for EXT4? Right now, there is none, because of differing inode sizes and some extra features over EXT2 that EXT4 demands I think.
1. Re:Windows Driver by fuzzyfuzzyfungus · 2010-01-14 09:38 · Score: 4, Insightful
  
  I can't imagine why it would.
  
  To the best of my knowledge, Google uses pretty much no Windows servers themselves(at least not for any of their public facing products, they almost certainly have some kicking around) and "a vast number of instances of custom in-house server applications" is among the least plausible environments for a Windows server deployment, so that is unlikely to change.
  
  On the desktop side, Google has a bunch of stuff that runs on Windows; but it all communicates with Google's servers over various ordinary web protocols and stores local files with the OS provided filesystem. The benefits of EXT4 on Windows would have to be pretty damn compelling for them to start requiring a kernel driver install and a spare unformatted partition.
  
  I suppose it is conceivable that some Google employee might decide to do it, for more or less inscrutable reasons; but it would have no connection at all to Google's broader operation or strategy.
2. Re:Windows Driver by Teun · 2010-01-14 12:38 · Score: 1
  
  Double booting.
  
  --
  "The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
3. Re:Windows Driver by drinkypoo · 2010-01-14 23:42 · Score: 1
  
  Installing the ext2 IFS on Windows XP leads to frequent lockups and crashes. I've tried it on several machines now, and over several versions. Wny would you want one anyway? Work in Linux.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
where are the benchmarks? by Alvaro+Martinez · 2010-01-14 09:26 · Score: 1

i di'dnt read the funky article because it's been slashdoted, but i'd like to see properly the benchmarks
Ubuntu 9.10? by GF678 · 2010-01-14 09:36 · Score: 4, Interesting

Gee, I hope they're not using Ubuntu 9.10 by any chance: http://www.ubuntu.com/getubuntu/releasenotes/910

There have been some reports of data corruption with fresh (not upgraded) ext4 file systems using the Ubuntu 9.10 kernel when writing to large files (over 512MB). The issue is under investigation, and if confirmed will be resolved in a post-release update. Users who routinely manipulate large files may want to consider using ext3 file systems until this issue is resolved. (453579)
The damn bug is STILL not fixed apparently. Some people get the corruption, and some don't. Scares me enough to not even try using ext4 just yet, and I'm still surprised Canonical was stupid enough to have ext4 as the default filesystem in Karmic.
Then again, perhaps Google knows what they're doing.
1. Re:Ubuntu 9.10? by Nimey · 2010-01-14 09:57 · Score: 1
  
  Then again, perhaps Google knows what they're doing.
  Moreso than your average Slashdotter, I expect.
  
  --
  Hail Eris, full of mischief...
  
  E pluribus sanguinem
2. Re:Ubuntu 9.10? by Lennie · 2010-01-14 10:22 · Score: 4, Insightful
  
  They employ the main developer of ext2, ext3 and ext4.
  
  He probably knows a lot about it.
  
  --
  New things are always on the horizon
3. Re:Ubuntu 9.10? by Anonymous Coward · 2010-01-14 10:24 · Score: 1, Interesting
  
  From the bug comments, this could be linked to latent kernel bug on journal checksums. Which went unnoticed until they were enabled by default after 2.6.31 and and reverted in 2.6.32-rc6. If ubuntu picked up that patch for their kernel, that would have caused corruptions.
  http://bugzilla.kernel.org/show_bug.cgi?id=14354
4. Re:Ubuntu 9.10? by Randle_Revar · 2010-01-14 10:48 · Score: 1, Insightful
  
  Ubuntu makes no sense for a company with Google's size, resources, and needs
  
  --
  Climate Progress - Hell and High Water
5. Re:Ubuntu 9.10? by rdnetto · 2010-01-14 12:57 · Score: 1
  
  I learnt the hard way - when I upgraded to 9.04 (and specifically selected ext4) I found that the system would crash when I emptied trash. Ever since then I've stuck to XFS.
  
  --
  Most human behaviour can be explained in terms of identity.
6. Re:Ubuntu 9.10? by RoboRay · 2010-01-14 14:58 · Score: 2, Interesting
  
  Yeah, they've got their own custom OS... Goobuntu.
7. Re:Ubuntu 9.10? by tytso · 2010-01-14 18:03 · Score: 3, Informative
  
  So Canonical has never reported this bug to LKML or to the linux-ext4 list as far as I am aware. No other distribution has complained about this > 512MB bug, either. The first I heard about it is when I scanned the Slashdot comments.
  Now that I'll know about it, I'll try to reproduce it with an upstream kernel. I'll note that in 9.04, Ubuntu had a bug which as far as I know, must have been caused by their screwing up some patch backports. Only Ubuntu's kernel had a bug where rm'ing a large directory hierarchy would have a tendency to cause a hang. No one was able to reproduce it on an upstream kernel,
  I will say that I don't ever push patches to Linus without running them through the XFS QA test suite. (Which is now generalized enough so it can be used on a number of file systems other than just XFS). If it doesn't have a "write a 640 MB file" and make sure it isn't corrupted, we can add it and then all of the file systems which use the XFSQA test suite can benefit from it.
  (I was recently proselytizing the use of the XFS QA suite to some Reiserfs and BTRFS developers. The "competition" between file systems is really more of a fanboy/fangirl thing than at the developer level. In fact, Chris Mason, the head btrfs developer, has helped me with some tricky ext3/ext4 bugs, and in the past couple of years I've been encouraging various companies to donote engineering time to help work on btrfs. With the exception of Hans Reiser, who has in the past me of trying to actively sabotage his project --- not true as far as I'm concerned --- we all are a pretty friendly bunch and work together and help each other out as we can.)
8. Re:Ubuntu 9.10? by inKubus · 2010-01-14 20:24 · Score: 1
  
  That's why people don't use Ubuntu or even Debian for important servers. I've got a Fedora Core 4 box that hasn't been rebooted since 2006 with quite a heavy load of web sites. In production I'm using CentOS 5.4 which is just fine with kernel 2.6.18. EXT4, pft. Google has plenty of money, they should use ramfs and add more ram and more boxes. Why even mess with disks for a search index? It's like the definition of volatile data.
  
  --
  Cool! Amazing Toys.
9. Re:Ubuntu 9.10? by Simetrical · 2010-01-15 05:44 · Score: 1
  
  That's why people don't use Ubuntu or even Debian for important servers.
  Wikipedia uses Ubuntu for all its Linux servers. Hasn't had serious problems yet, and the distro packages are both reasonably stable and up-to-date, which is true of few to no other major distributions.
  
  --
  MediaWiki developer, Total War Center sysadmin
Give us a +-0 Counterbalance by itomato · 2010-01-14 09:38 · Score: 2, Interesting

When does black become white?
#CCCCCC or #888888
Is there overlap with Flamebait?
When does an otherwise 'troll' moderation-worthy comment lose out on status that could validate 19 responses, with 50% scoring +2?
Sometimes a troll is a troll, but sometimes its just a shadow.
well, duh by Dan+Yocum · 2010-01-14 09:52 · Score: 1

"In their benchmarking, EXT4 and XFS performed, as impressively as each other."
Welcome to 2001, subby. Glad you could make it this decade.
I completely understand them not jumping to XFS, though. I'd never want to convert exabytes of data from one FS to another.
Re:Well by Nadaka · 2010-01-14 09:52 · Score: 1

what about all the people who don't even bother to log in to post as AC?
Downtime by Joucifer · 2010-01-14 10:01 · Score: 2, Interesting

Is this why Google was down for about 30 minutes today? Did anyone else even experience this or was it a local issue?
1. Re:Downtime by mirix · 2010-01-14 11:13 · Score: 1
  
  impossible, google doesn't go down.
  
  --
  Sent from my PDP-11
Re:Well by Captain+Splendid · 2010-01-14 10:07 · Score: 2, Informative

Or, you could stop being lazy and go tweak your preferences, thereby saving the rest of us from your whining.

--
Linux, you magnificent bastard, I read the fucking manual!
Re:I upgraded from ext3 to ext4 and by jjohnson · 2010-01-14 10:21 · Score: 3, Informative

When you run data centres around the world that are collectively the most powerful supercomputer known to man, you too can get a front page story on ./ announcing your upgrade.
Until then, STFU.

--
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
XFS has one major problem by ChipMonk · 2010-01-14 10:32 · Score: 1

The data path from program to disk is loooong. On a system with heavy CPU load, benchmarks on a well-tuned XFS system can fall to the same level as ext2 with defaults. Even multi-core doesn't help XFS under load; running Folding@Home at nice +19 still sucker-punched it.

JFS? It fails to scale on disk-saturated systems. However, it does have some optimizations specific to database workloads. Populating a sparse file ran fastest on my system, where XFS was a total fail.

ext3 under heavy CPU load showed degradation that appeared in the benchmarks, but was noticeable on the desktop only if I was watching for it. And ext4 (formatted, not converted from ext2/3) under load is faster than ext3 without load, when using "elevator=noop" at boot.

N.B.: The above benchmarks on my system all used external journals, except ext2 natch.
1. Re:XFS has one major problem by sznupi · 2010-01-14 15:40 · Score: 1
  
  I'm curious in what way exactly JFS fails to scale on disk-saturated systems?
  Mostly because sometimes I look into possible alternatives for "standard" ext family, and JFS seems quite sensible - without any serious issues or perf problems (where it's not stellar but also solid across the spectrum), utilizing free disk space most efficiently and with very low cpu usage; both things might be handy in some situations.
  
  --
  One that hath name thou can not otter
Re:Well by icebraining · 2010-01-14 10:34 · Score: 1

You can configure an higher threshold; 1 should be enough to filter most ACs.

--
Dilbert RSS feed
Re:Well by labradore · 2010-01-14 10:53 · Score: 1

I assume you mean Increase the signal-to-noise ratio. Did you mean reduce the noise floor?
Here ya go.. by msimm · 2010-01-14 10:54 · Score: 1

Truthfully though, where the heck are the meta-data based filesystems that we were promised? I've love to be able to, on a filesystem level, instantly pull up a folder view of all videos - or all images. Or all images of my dog. Or all images outdoors. Or all images of my dog outdoors.
Here ya go.

--
Quack, quack.
1. Re:Here ya go.. by MichaelSmith · 2010-01-14 14:27 · Score: 1
  
  I am not sure I want to trust my porn collection to mysql. Postgres possibly. Oracle definitely. But I don't think I can afford that.
  
  --
  http://michaelsmith.id.au
2. Re:Here ya go.. by MBGMorden · 2010-01-14 15:58 · Score: 1
  
  I admin databases for a living - I'm well aware of the concept ;). I just want it available at the filesystem level, not as a tack-on organizer.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:I upgraded from ext3 to ext4 and by Jake+Griffin · 2010-01-14 10:59 · Score: 1

(See first post)

--
SIG FAULT: Post index out of bounds.
NEXT UP by kuzb · 2010-01-14 11:06 · Score: 1

BREAKING NEWS:
Google switches to new softer 2-ply toilet paper to reduce employee chafing.

--
BeauHD. Worst editor since kdawson.
Re:Has Ted Cooked the Benchmarks Again? by tytso · 2010-01-14 12:11 · Score: 5, Informative

So I'm not sure what you're talking about. If you're talking about delayed allocation, XFS has it too, and the same buggy applications that don't use fsync() will also lose information after a buggy proprietary Nvidia video driver crashes your machine, regardless of whether you are using XFS or ext4.
If you are talking about the change to _ext3_ to use data=writeback, that was a change that Linus made, not me, and ext4 has always defaulted to data=ordered. Linus thought that since the vast majority of Linux machines are single-user desktop machines, the performance hit of data=ordered, which is designed to prevent exposure of uninitialized data blocks after a crash wasn't worth it. I and other file system engineers disagreed, but Linus's kernel, Linus's rules. I pushed a patch to ext3 which makes the default a config option, and as far as I know the enterprise distro's plan to use this config option to keep the defaults the same as before for ext3.
Since it was my choice, I actually changed the defaults for ext4 to use barriers=1. which Andrew Morton vetoed for ext3 because again, he didn't think it was worth the performance hit. But with ext4, the benefits of delayed allocation and extents are so vast that it completely dominated the performance hit of turning on write barriers. That is what most of the performance benefits for ext4 come from, and it is very much a huge step forward compared to ext3.
So with respect, you don't know what you are talking about.
-- Ted
ext4 much faster than ext3 for large files & d by Mandrel · 2010-01-14 12:27 · Score: 1

I've seen huge performance leaps for large files and directories after reinstalling my system on an ext4 partition. Ext3 was very slow to list directories containing large numbers of files, and deleting very large files took tens of seconds, during which the filesystem was hung. I couldn't remove large files while recording TV, otherwise the recording would hang and skip several seconds. No longer the case now I'm on ext4.
Re:Well by budgenator · 2010-01-14 13:19 · Score: 1

What's the fun in that, how would you know if somebody flames you? Half the time I get flamed, the initiating post ends up modded to +5

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Mod parent UP please. by leoxx · 2010-01-14 14:38 · Score: 1

Hello moderators?
Re:"Their is the problem" by Lennie · 2010-01-14 22:47 · Score: 1

English is not my mother-tongue, sorry if I occasionally make a mistake. I guess wasn't paying attention.

Let me add to the original discussion:
especially not some big corporation

--
New things are always on the horizon
Re:Has Ted Cooked the Benchmarks Again? by Lisandro · 2010-01-14 23:10 · Score: 1

Great post. Thank you for your insight!
What about reliability of EXT4 by jassuncao · 2010-01-14 23:24 · Score: 1

I live in a zone where power failures are very common. While I was using EXT3 I lost data for several times due to power failures, and there was even a time a disk got corrupted. After I switched to JFS the data lost is minimal and I never had a corrupted disk. Another think I enjoy in JFS is that its really quick to fsck a disc after power failure. So is it safe to switch to EXT4 ?
WTF? GFS DUDE by hesaigo999ca · 2010-01-15 02:15 · Score: 1

Google has their own proprietary file system called gfs (and now gfs2), who came up with this rubbish?
They have special file system because of their design demands and the inherent flaws
in most file systems when you cluster vast amounts of computers together.
What does the writer of this post think he will accomplish by sending out this garbage is what I want to know!
1. Re:WTF? GFS DUDE by RichiH · 2010-01-16 05:49 · Score: 1
  
  GFS works on top of a normal FS. Facts are funny like that, especially when you are already in outrage mode.
  What do you think you will accomplish by sending out this garbage is what I want to know! ;)
2. Re:WTF? GFS DUDE by hesaigo999ca · 2010-01-18 01:52 · Score: 1
  
  Once you take something such as a file system, however incomplete, and then
  modify it to the Ts to such an extent that it now has become your own file system
  I doubt, you could apply a patch of any sort to it, at the base, as you say,
  that would fix anything that might resemble the old file system!
Re:Has Ted Cooked the Benchmarks Again? by segedunum · 2010-01-16 14:44 · Score: 1

So I'm not sure what you're talking about. If you're talking about delayed allocation, XFS has it too, and the same buggy applications...
Stop blaming the applications for a filesystem problem Ted. The excuse doesn't wash no matter how many times you use it, and no, XFS does not have it.
Re:Has Ted Cooked the Benchmarks Again? by tytso · 2010-01-17 02:53 · Score: 2, Informative

So I'm not sure what you're talking about. If you're talking about delayed allocation, XFS has it too, and the same buggy applications...
Stop blaming the applications for a filesystem problem Ted. The excuse doesn't wash no matter how many times you use it, and no, XFS does not have it.
http://en.wikipedia.org/wiki/XFS#Delayed_allocation
Any other questions? At the very least the applications are non-portable in the sense that they were depending on behavior not guaranteed by POSIX. XFS, btrfs, ZFS, and many if not most modern file systems do delayed allocation. It's one of the basic file system tricks to improve performance.
Re:Has Ted Cooked the Benchmarks Again? by segedunum · 2010-01-17 13:51 · Score: 1

Any other questions? At the very least the applications are non-portable in the sense that they were depending on behavior not guaranteed by POSIX.
The code written in those applications has been around for years, so stop trying to blame that for a problem that only materialised recently (although the 'problem' shouldn't be new to anyone really). A filesystem blaming userspace for certain things happening and hiding behind POSIX for well known behaviour and code that should be well tested first is one of the most retarded, and worrying, things I have ever heard. Userspace is not going to be 'fixed' in this regard for reasons which should be damn obvious. No, we're not all going to switch to sqlite. Yes, small reads and writes are part and parcel of a great many applications, and will be for years to come. Granted, XFS has historically had more of a problem in this area than other filesystems but at least they have a well tested implementation that is years ahead of ext4 - not that the approach isn't more 'risque'.

It just wiffs of some backside covering, that's all. In any case 'XFS does it too!' isn't much of a defence, especially given the use cases of the ext* line of filesystems and that it is expected to be a ext2/3 replacement.
Re:Has Ted Cooked the Benchmarks Again? by tytso · 2010-01-17 14:18 · Score: 2, Informative

So before I tried agitating for programmers to fix their buggy applications, I had already implemented both the heuristic that XFS uses (if you truncate a file descriptor, add an implicit fsync on the close of that fd), and in addition I had implemented another heuristic (if you rename on top of an existing file, fsync the source file of the rename). This was to work around buggy applications, and as you can see, ext4 does even more than XFS does.
At the end of the day, though, the heuristic can sometimes get things wrong, and sometimes the heuristic will be too aggressive in forcing fsync()'s when it's not really necessary, which is why it's good to at least try to education application programs about something which even you agree shouldn't be a new thing.
(For example, if you don't fsync, and you want to run your application on another OS, like say, Solaris, you will be very sad.)
But it wasn't backside covering, although most people don't seem to realize it, FIRST I added the hueristics to work around the buggy code, and THEN I agitated for people to fix their d*mn code. But application programmers don't like being told that they are wrong, so this seems to be a case of "blame/shoot the messenger" --- with me having been cast into the role of the messenger.