Denial-of-Service Attack Found In Btrfs File-System

← Back to Stories (view on slashdot.org)

Denial-of-Service Attack Found In Btrfs File-System

Posted by timothy on Friday December 14, 2012 @01:24PM from the at-that-range-a-hammer-works-too dept.

An anonymous reader writes "It's been found that the Btrfs file-system is vulnerable to a Hash-DOS attack, a denial-of-service attack caused by hash collisions within the file-system. Two DOS attack vectors were uncovered by Pascal Junod that he described as causing astonishing and unexpected success. It's hoped that the security vulnerability will be fixed for the next Linux kernel release." The article points out that these exploits require local access.

210 comments

Min score:

Reason:

Sort:

Who ported btrfs to DOS? by Nimey · 2012-12-14 13:27 · Score: 4, Funny

and should we give him a medal or lynch him?

--
Hail Eris, full of mischief...

E pluribus sanguinem
1. Re:Who ported btrfs to DOS? by macraig · 2012-12-14 13:46 · Score: 5, Funny
  
  Do I have to choose? Can I hang a medal on him, and then hang him? I'll make the medal 20 pounds to speed up the lynching.
2. Re:Who ported btrfs to DOS? by Anonymous Coward · 2012-12-14 15:07 · Score: 0
  
  You mean Fallen Art?
3. Re:Who ported btrfs to DOS? by Anonymous Coward · 2012-12-14 17:31 · Score: 0
  
  DoS would be the right acronym then.
4. Re:Who ported btrfs to DOS? by maxwell+demon · 2012-12-14 19:25 · Score: 3, Informative
  
  DOS = Disk Operating System
  DoS = Denial of Service
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
5. Re:Who ported btrfs to DOS? by byornski · 2012-12-15 00:31 · Score: 3, Funny
  
  DOS = Density of States
6. Re:Who ported btrfs to DOS? by Anonymous Coward · 2012-12-15 00:41 · Score: 0
  
  No shit.. I guess you lack any sense of humor and are unable to understand a joke...
7. Re:Who ported btrfs to DOS? by maxwell+demon · 2012-12-15 01:31 · Score: 0
  
  And I guess you lack any experience of Slashdot. Otherwise you would have clicked the Parent link to see which post I actually answered to. Hint: That post did not contain a joke.
  And BTW, the original joke was funny because it played exactly on the misspelling which was already in the summary, and used the correct interpretation of the acronym.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
8. Re:Who ported btrfs to DOS? by Anonymous Coward · 2012-12-15 01:37 · Score: 0
  
  DOS = 2
9. Re:Who ported btrfs to DOS? by Pf0tzenpfritz · 2012-12-15 21:01 · Score: 1
  
  DOS++; if (!DOS==TRES) { return ("unit test failed"); }
  
  --
  Oh, the beautiful gloss of greality!
Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 13:35 · Score: 5, Interesting

btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.
I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)
This doesn't mean to suck Sun's teat for ZFS access... but at least try to come close to what even NTFS or even ReFS offers...
1. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 13:46 · Score: 1
  
  What's Sun?
2. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 13:49 · Score: 0
  
  zfs on linux is probably more stable than brtfs is.
3. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 14:02 · Score: 5, Informative
  
  ZFS on FreeBSD or FreeNAS is great. Easily saturates gigE with a simple mirror of recent 7200rpm disks. It scales up from there, and FreeBSD is pretty rock solid.
4. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 14:07 · Score: 4, Interesting
  
  btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.
  I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)
  This doesn't mean to suck Sun's teat for ZFS access... but at least try to come close to what even NTFS or even ReFS offers...
  Hear hear! Backup admin here, just want to add before the unwashed masses of armchair Linux admins show up, one example of an enterprise filesystem feature is the NTFS change journal. It makes the file system scan as part of an incremental backup run in constant time.
  It's sad on other systems with large numbers of files to schedule subdirectories for different times of day to deal with scanning overhead.
5. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 14:15 · Score: 0
  
  ... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)
  What kind of enterprise features you would need from a file system?
6. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 14:22 · Score: 5, Informative
  
  NTFS doesn't have snapshots. Instead it relies on volume shadow copies, with known severe performance artifacts caused by needing to move snapshotted data out of the way when new writes come in. Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty. The takeaway: I would not go so far as to claim Microsoft has an enterprise-worthy solution either. If you want something with industrial strength dedup, snapshots and fault tolerance, you won't be getting it from Micorosft.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
7. Re:Can we get a real Linux filesystem, please? by smash · 2012-12-14 14:26 · Score: 2
  
  Data integrity for one?
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
8. Re:Can we get a real Linux filesystem, please? by grumbel · 2012-12-14 14:41 · Score: 3, Informative
  
  I have seen the userlevel ZFS crash multiple times, it's also slow as hell. It's still worth it if you are short on storage and want to reduce the size of your backup, but I wouldn't exactly call it ready for production.
9. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 14:44 · Score: 2, Interesting
  
  Wouldn't it be cheaper and just as effective to use FreeBSD or FreeNAS for your data? if you're considering either Windows or Solaris then obviously you don't need a specific operating system. I would think FreeBSD (or even ZFS on Linux) would suit your purposed better 9and with less expense) than Windows or Solaris.
10. Re:Can we get a real Linux filesystem, please? by maz2331 · 2012-12-14 14:51 · Score: 5, Informative
  
  ZFS on Linux does exist as a kernel module that is pretty stable and works well. http://zfsonlinux.org/ -- it was put out by Lawrence Livermore National Lab, but can't be included with the kernel distros due to GPL / CDDL license compatability issues.
11. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 14:58 · Score: 1
  
  That's what you should expect from your storage array. And that is what you get with real storage arrays.
  A filesystem level approach to the problem can only be a bandaid, at best part of a larger solution.
12. Re:Can we get a real Linux filesystem, please? by dbIII · 2012-12-14 15:00 · Score: 3, Informative
  
  Kernel level probably is ready, but not on 32bit (big hassles there but probably not a big deal to most) and on 64 bit there are some memory usage problems and performance seems to suck when there's a dozen or so hosts keeping connections to files on ZFS open via NFS at the same time. There's still a way to go before ZFS on linux gets to where it is on FreeBSD but it's still early days, and for many usage patterns it looks like it is ready for production.
13. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 15:01 · Score: 2, Informative
  
  Linux has production level encryption, snapshots, and LVM2. What are you talking about?
  Unless you have very specific uses, deduplication should be done at your storage array really. It's not a high priority to implement in the filesystem. (No, your anecdote does not make it a high priority).
14. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 15:28 · Score: 1
  
  I have seen the userlevel ZFS crash multiple times, it's also slow as hell. It's still worth it if you are short on storage and want to reduce the size of your backup, but I wouldn't exactly call it ready for production.
  I think parent is talking about this, not the userlevel FUSE-based ZFS:
  http://zfsonlinux.org/
15. Re:Can we get a real Linux filesystem, please? by WWJohnBrowningDo · 2012-12-14 15:29 · Score: 2
  
  Did you guys look at FreeBSD?
16. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 15:44 · Score: 0
  
  Yes, it's great. Still can't shrink pools. Still uses 5GB of ram per TB of disk for dedup at 64k blocksize. But it's great.
  And just FYI, fucking FAT32 can saturate GbE with a *single* 7200rpm drive.
17. Re:Can we get a real Linux filesystem, please? by Marxdot · 2012-12-14 15:59 · Score: 1
  
  Why should deduplication and snapshots (and even encryption, I suppose) be done by filesystems themselves? Why require a repetition of effort in implementing every filesystem? Also, ZFS is an insane thing written by people who don't seem to understand that keeping a good separation of concerns can lead to a rather slick set of general tools that can be used on almost any fs.
  Oh, right, 'enterprise features'. That certainly sets the alarm bells ringing.
18. Re:Can we get a real Linux filesystem, please? by blade8086 · 2012-12-14 16:06 · Score: 1
  
  LVM has snapshots and DM has encryption.
  And since when is deduplication a 'critical' enterprise feature?
  e.g. who else has it other than ZFS in the unix world without having an expensive addon product etc?
  (other than DragonFlyBSD's hammer, which unfortunately corporate weenies have testicles too small to deploy)
  maybe critical for your application - but this doesn't mean its mega-lagging behind.
19. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-14 16:08 · Score: 4, Insightful
  
  NTFS doesn't have snapshots. Instead it relies on volume shadow copies, with known severe performance artifacts caused by needing to move snapshotted data out of the way when new writes come in. Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty. The takeaway: I would not go so far as to claim Microsoft has an enterprise-worthy solution either. If you want something with industrial strength dedup, snapshots and fault tolerance, you won't be getting it from Micorosft.
  What nonsense. VSS is the snapshot solution for NTFS, and of course it uses copy-on-write. Microsoft VSS backup architecture is years ahead of Linux... LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation so it becomes useless. MS VSS does this, and always has.
  I'm normally a Linux fanboi but when you sprout rubbish like this I have no hesitation in correcting you.
20. Re:Can we get a real Linux filesystem, please? by Agent+ME · 2012-12-14 16:42 · Score: 2
  
  If snapshots are handled by the filesystem, then it could be possible to snapshot a specific directory or file rather than a whole partition for example. Snapshots in the filesystem also prevents stuff like changes to space that was free when the snapshot was taken from being unnecessarily remembered.
21. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 16:46 · Score: 3, Informative
  
  Tried to find some more information on this. First discovery: VSS stands for "Volume Shadow copy Service", not "Visual SourceSafe", as was my first association. :)
  AFAICT he's saying pretty much what Microsoft is saying:
  
  When a change to the original volume occurs, but before it is written to disk, the block about to be modified is read and then written to a "differences area", which preserves a copy of the data block before it is overwritten with the change. Using the blocks in the differences area and unchanged blocks in the original volume, a shadow copy can be logically constructed that represents the shadow copy at the point in time in which it was created.
  
  The disadvantage is that in order to fully restore the data, the original data must still be available. Without the original data, the shadow copy is incomplete and cannot be used. Another disadvantage is that the performance of copy-on-write implementations can affect the performance of the original volume.
  Do you have a newer reference?
22. Re:Can we get a real Linux filesystem, please? by darkain · 2012-12-14 17:03 · Score: 0
  
  Except, that isn't what you get with "real storage arrays" in practice, only in theory.
  http://en.wikipedia.org/wiki/ZFS#Silent_Data_Corruption
23. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 17:12 · Score: 0
  
  If your file system doesn't preserve data integrity, nothing on the storage layer can rectify that. I think you're actually talking about error resilience -- and I'll disagree that handling storage errors is a "bandaid". Even assuming that it is feasable and cost-effective to put indefectible storage on important machines, the most common case is always going to be commodity disks.
24. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 17:17 · Score: 0
  
  NTFS doesn't have snapshots. Instead it relies on volume shadow copies, with known severe performance artifacts caused by needing to move snapshotted data out of the way when new writes come in. Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty. The takeaway: I would not go so far as to claim Microsoft has an enterprise-worthy solution either. If you want something with industrial strength dedup, snapshots and fault tolerance, you won't be getting it from Micorosft.
  What are you replying to, what does this have to do with change journals or backups? VSS does use COW, WTF is this...
25. Re:Can we get a real Linux filesystem, please? by LordLimecat · 2012-12-14 17:26 · Score: 3, Informative
  
  FAT32 is going to be faster than a LOT of filesystems precisely because it lacks features like dedup, any notion of real ACLs, and, oh, I dont know, data integrity. Thats why if you want a really fast RAMDisk, you dont use NTFS or ReFS, you use FAT16 or FAT32.
26. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 18:07 · Score: 0
  
  There are dozens of stable Linux filesystems. Pick one. Millions of people use them. Most of what you call the internet uses them. 99.99% of the all of the worlds supercomputers use them. "Oh it doesn't blah." No sparky, that's a lie. Linux doesn't keep redundant data in the first place. No good deduplication software? Don't put duplicate data on the system in the first place! Clearly there are windows shills around not wanting the best, instead wanting the mickeysoft crap (once again). Lie to yourself, don't push your craphead lies onto us.
27. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 18:28 · Score: 1
  
  This is a tricky issue. If you keep all old file where in their original sectors and write changes in new places, your files get fragmented to hell. Only your original snapshot is contiguous, while your current data is scattered about your disk. This may work fine if you have dozens of spindles making up your volume, or for an SSD, but it's not going to work for a regular HDD.
  What you'll end up with is fast write performance and horrible read performance. Since most files are read far more often than they're written, it's generally better to make the current data contiguous and the rarely-used snapshots fragmented.
  Of course, it's probably best to write the new data where it's convenient and later on do some defragmentation to put the data where it's fastest to read.
  dom
28. Re:Can we get a real Linux filesystem, please? by guruevi · 2012-12-14 18:34 · Score: 1
  
  Solaris and it's derivatives can be had for free. You don't HAVE to buy it and it's derivatives like OpenIndiana are very stable.
  
  --
  Custom electronics and digital signage for your business: www.evcircuits.com
29. Re:Can we get a real Linux filesystem, please? by smash · 2012-12-14 18:42 · Score: 1
  
  1. No storage array does it properly. 2. You can BUILD a ZFS storage array with de-dup, compression, self-healing, etc. for cheaper than you can buy a Netapp or EMC. A filesystem approach is the only way to ensure end-to-end data integrity, correcting tranmission errors between the host and the storage, etc.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
30. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 18:43 · Score: 0
  
  RAM is cheap. Typically, shrinking pools is not required, i don't know of ANYONE who had a shrinking need for storage capacity.
31. Re:Can we get a real Linux filesystem, please? by belrick · 2012-12-14 18:58 · Score: 1
  
  Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty.
  WAFL doesn't do copy-on-write. Copy-on-write means a write to a block in a file requires the original block to be read, written elsewhere for the snapshot, then the new block written in the original location. That's exactly what WAFL doesn't do. WAFL writes all changed blocks for multiple files in big RAID stripes, updating pointers to current copies and leaving snapshot pointers pointing to old copies of the updated files. Very efficient for writes, but changes almost all reads, random or sequential (within a file) into random reads (within the filesystem) because file blocks get scattered according to write order, not location of the block within the file. That's why they want lots of spindles in an aggregate and they love RAM cache and flash cache.
  But since you say that copy-on-write avoids the write penalty I think you know what is does but simply don't know that it isn't copy-on-write.
32. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 18:59 · Score: 0
  
  Oh, because deduplication is now the new Black ?
  So one error on the can disk corrupt more than one file ?
  Cool...
33. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 19:17 · Score: 5, Informative
  
  VSS is the snapshot solution for NTFS, and of course it uses copy-on-write
  Well. Maybe you better sit down in a comfortable chair and think about this a bit. From Microsoft's site: When a change to the original volume occurs, but before it is written to disk, the block about to be modified is read and then written to a “differences area”, which preserves a copy of the data block before it is overwritten with the change.
  Think about what this means. It is not a "copy-on-write", it is a "copy-before-write". Gross abuse of terminology if anybody tries to call it a "copy-on-write", which has the very specific meaning of "don't modify the destination data". Instead, copy it, then modify the copy. OK, are we clear? VSS does not do copy-on-write, it does copy-before-write.
  Now let's think about the implications of that. First, the write needs to be blocked until the copy-before-write completes, otherwise the copied data is not sure to be on stable storage. The copy-before-write needs to read the data from its original position, write it to some save area, then update some metadata to remember which data was saved where. How many disk seeks is that, if it's a spinning disk? If the save area is on the same spinning disk? If it's flash, how much write multiplication is that? When all of that is finally done, the original write can be unblocked and allowed to proceed. In total, how much slower is that than a simple, linear write? If you said "on the order of an order of magnitude" you would be in the ballpark. In face, it can get way worse than that if you are unlucky. In the best imaginable case, your write performance is going to take a hit by a factor of three. Usually, much much worse.
  OK, did we get this straight? As a final exercise, see if you can figure out who was talking nonsense.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
34. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 19:25 · Score: 2
  
  If you keep all old file where in their original sectors and write changes in new places, your files get fragmented to hell.
  Microsoft's "shadow copy" doesn't work at the file level, it works at the block level, so it doesn't know anything about files. Btrfs and its ilk try to leave some empty space distributed across the volume, so copy-on-write can leave the copies in fairly reasonable places. After the copy is committed, the original space can be freed, so the next update won't mess things up too badly either. Snapshots mess this up because the original space doesn't get freed. But then, snapshots are always messed up, there is no such thing as a perfect snapshot strategy with respect to disk seeking. Incidentally, with flash you don't care about that any more, there is no seek time.
  Anyway, yes, with a crappy copy-on-write (like Netapp's) you get horrible read fragmentation. With an intelligent implementation, it isn't so bad. Note that Btrfs is turning in good benchmarks, including read performance in mixed read/write loads.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
35. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 19:29 · Score: 1
  
  LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation so it becomes useless.
  You're also wrong about that. You can concatenate multiple logical volumes as a single logical volume and snapshot that atomically.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
36. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 19:41 · Score: 1
  
  Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty.
  WAFL doesn't do copy-on-write. Copy-on-write means a write to a block in a file requires the original block to be read, written elsewhere for the snapshot, then the new block written in the original location. That's exactly what WAFL doesn't do. WAFL writes all changed blocks for multiple files in big RAID stripes, updating pointers to current copies and leaving snapshot pointers pointing to old copies of the updated files. Very efficient for writes, but changes almost all reads, random or sequential (within a file) into random reads (within the filesystem) because file blocks get scattered according to write order, not location of the block within the file. That's why they want lots of spindles in an aggregate and they love RAM cache and flash cache.
  But since you say that copy-on-write avoids the write penalty I think you know what is does but simply don't know that it isn't copy-on-write.
  We both know what we're talking about, we just disagree on terminology. Properly, a "copy-on-write" doesn't modify the original destination. Nobody should ever use the term "copy-on-write" to describe the algorithm that is properly "copy-before-write". The strategy that leaves the original destination untouched and updates pointers to point at the modified copy is correctly called "copy-on-write", but because the terminology has been so commonly abused by the likes of Microsoft and their followers, it is better to be clear and call that "redirect-on-write".
  Finally, Netapp gets massive read fragmentation because they suck, not because it can't be avoided.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
37. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 19:42 · Score: 0
  
  Bullshit, I use all three weekly.
  I like ZFS, it's not a religion.
  NTFS has tons of features which no one uses :)
  BTRFs was fine, except some bug with KVM
38. Re:Can we get a real Linux filesystem, please? by myxiplx · 2012-12-14 19:50 · Score: 0
  
  Unless you go for ReFS, which is Microsofts new file system available in Server 2012. It's still new, but looks to have all the best features of NTFS, ZFS and Btrfs rolled into one.
39. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-14 20:04 · Score: 4, Insightful
  
  VSS is the snapshot solution for NTFS, and of course it uses copy-on-write
  Well. Maybe you better sit down in a comfortable chair and think about this a bit. From Microsoft's site: When a change to the original volume occurs, but before it is written to disk, the block about to be modified is read and then written to a “differences area”, which preserves a copy of the data block before it is overwritten with the change.
  Think about what this means. It is not a "copy-on-write", it is a "copy-before-write". Gross abuse of terminology if anybody tries to call it a "copy-on-write", which has the very specific meaning of "don't modify the destination data". Instead, copy it, then modify the copy. OK, are we clear? VSS does not do copy-on-write, it does copy-before-write.
  Now let's think about the implications of that. First, the write needs to be blocked until the copy-before-write completes, otherwise the copied data is not sure to be on stable storage. The copy-before-write needs to read the data from its original position, write it to some save area, then update some metadata to remember which data was saved where. How many disk seeks is that, if it's a spinning disk? If the save area is on the same spinning disk? If it's flash, how much write multiplication is that? When all of that is finally done, the original write can be unblocked and allowed to proceed. In total, how much slower is that than a simple, linear write? If you said "on the order of an order of magnitude" you would be in the ballpark. In face, it can get way worse than that if you are unlucky. In the best imaginable case, your write performance is going to take a hit by a factor of three. Usually, much much worse.
  OK, did we get this straight? As a final exercise, see if you can figure out who was talking nonsense.
  I concede that the terminology used by the MS article is misused. I don't think you're thinking the performance issues through though. You start with a file nicely laid out linearly on disk, and you take a snapshot so you can make a backup. Now you make a modification to the middle of the file and what happens? Suddenly the middle of the file is elsewhere on disk, and in the case of LVM this is invisible to the filesystem so no amount of defragging is going to fix it. This situation persists long after you have taken your backup and thrown the snapshot away. Of course this doesn't matter for flash but we're not all there yet. If BTRFS does snapshots using copy-on-write (correct definition) then this will be a problem too, although if BTRFS is smart enough it should be able to repair the situation once the snapshot is discarded.
  VSS's way leaves the original data in-order on the storage medium. The difference area is likely on a completely different disk anyway so the copy-on-write (MS definition) could not be performed any other way.
40. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-14 20:11 · Score: 1
  
  LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation so it becomes useless.
  You're also wrong about that. You can concatenate multiple logical volumes as a single logical volume and snapshot that atomically.
  OK this is news to me. When I last asked about that it couldn't be done but that was a few years go. Google doesn't tell me how I can concatenate (say) my database lv and my logs lv (separate vg's because separate spindles), snapshot them, then un-concatenate them... a link would be appreciated.
41. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 21:08 · Score: 1
  
  lvm lets you concatenate any block devices into a virtual block device
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
42. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 21:14 · Score: 0
  
  Hah. A little RAID5 controller is not a "real storage array", buddy.
43. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 22:35 · Score: 2, Informative
  
  Modifications in the middle of files are extremely rare. It's true, running a database on top of a snapshotted spinning disk is probably going to suck. For normal users, keeping regular files mostly linear, and files in the same directory nearby each other is what matters, and yes, Btrfs does a credible job of that.
  I know why shadow copy works the way it does. 1) It's simple, therefore likely to work. 2) It's an easy answer to the "how do you control fragmentation" question. But the write performance issue is so bad that it's a poor solution no matter how you justify it. It's just an attempt to get away with being lazy for a largely uncritical audience that isn't big into benchmarking, or indeed, isn't used to good disk performance.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
44. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 22:37 · Score: 0
  
  Sounds like Btrfs envy. Question is, can they get to work reliably?
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
45. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 22:55 · Score: 0
  
  from wikpedia: "COW may also be used as the underlying mechanism for snapshots provided by logical volume management and Microsoft Volume Shadow Copy Service."
  It doesn't matter if the copy is the original o new data, the relevant event is that in the end there must be two versions of the data, the old and the new one. Where they are placed it's irrelevant it's an implementation detail. In the case of filesystems one strategy works better with one technology and the other with a newer one.
46. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-14 23:01 · Score: 0
  
  When you write in a block that is not the original you are also redirecting on write. MS implementation it's also a COW strategy with different implementation details.
47. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-14 23:30 · Score: 1
  
  Sure, Microsoft abuses the CoW terminology and Wikipedia documents that. More politely than necessary, IMHO.
  Copy-on-write leaves the original data unchanged. Copy on write makes a private copy, leaving the orignal unchanged. Microsoft has a different definition, but then Microsoft has a lot of different definitions. Let's you and me be precise about it, and avoid the terminology that Microsoft has wantonly polluted in its ignorance. Copy-before-write or redirect-on-write.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
48. Re:Can we get a real Linux filesystem, please? by MikeBabcock · 2012-12-15 00:10 · Score: 1
  
  Funny, even my home box uses LVM over dm-crypt over RAID on Linux just fine. And that's with Ext4 file systems.
  LVM lets me create a snapshot for consistent backups any time I want.
  
  --
  - Michael T. Babcock (Yes, I blog)
49. Re:Can we get a real Linux filesystem, please? by MikeBabcock · 2012-12-15 00:13 · Score: 1
  
  Totally aside from your main point, what does the spindle count have to do with your VG naming?
  pvcreate /dev/sda1
  pvcreate /dev/sdb1
  pvcreate /dev/sdc1
  vgcreate LotsOfDrives /dev/sda1 /dev/sdb1 /dev/sdc1
  Now if you want spindle-specific LVs:
  lvcreate -n dbdata LotsOfDrives /dev/sdb1
  lvcreate -n logdata LotsOfDrives /dev/sdc1
  
  --
  - Michael T. Babcock (Yes, I blog)
50. Re:Can we get a real Linux filesystem, please? by MikeBabcock · 2012-12-15 00:16 · Score: 0
  
  When you have a filesystem that understands hard links, deduplication is redundant.
  
  --
  - Michael T. Babcock (Yes, I blog)
51. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-15 00:27 · Score: 2
  
  I'm still not getting how you can simultaneously snapshot dbdata (optimised for read and write) and logdata (optimised for write) as an atomic operation. "Tough Love (215404)" said "concatenate them together" but I don't get what that means in this context.
  Last time I checked you would still have to snapshot one, then the other, and the resulting snapshots are almost certainly not going to give you a consistent backup because there would have been writes between the first and the second snapshots.
52. Re:Can we get a real Linux filesystem, please? by aix+tom · 2012-12-15 01:45 · Score: 1
  
  Which of course you can do that, but then you can't have the database LV and the log LV on different physical disks any more, which is what was asked.
  Can you post an example how you would concatenate two existing LVs, with existing file systems on them, mounted and being modified at the time. into a "new virtual block device" without even un-mounting them, and then make a consistent snapshot of them?
53. Re:Can we get a real Linux filesystem, please? by LWATCDR · 2012-12-15 02:12 · Score: 1
  
  I would say that you should look at BSD then. If you are willing to go open souce anyway FreeBSD offers ZFS. Too bad that more hardware and software companies do not support BSD as well as Linux.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
54. Re:Can we get a real Linux filesystem, please? by drinkypoo · 2012-12-15 02:27 · Score: 1
  
  There's still a way to go before ZFS on linux gets to where it is on FreeBSD but it's still early days, and for many usage patterns it looks like it is ready for production.
  Can I get it as just a module, or do I need to build a custom kernel package? I can do that, but I prefer not to.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
55. Re:Can we get a real Linux filesystem, please? by LWATCDR · 2012-12-15 02:33 · Score: 2
  
  Wow, just how many clueless people are on Slashdot posting as ACs?
  "No good deduplication software? Don't put duplicate data on the system in the first place!"
  Okay Sparky you have 5000 users on a server and that all save that email about vacation time or the pictures from the office party. Redundant data. This is a large system with lots of users, it is not for you leet Linux box you have in your mom's basement. Your plays on Microsoft's name are also childish and over done. Now there is a valid argument that deduplication of data should be done at the array level so that it does not need to be filesystem dependant but that is an argument for knowledgeable adults and not for the likes of you.
  You may go now, you bore me. You may come back when you learn enough to be interesting.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
56. Re:Can we get a real Linux filesystem, please? by smallfries · 2012-12-15 02:37 · Score: 1
  
  When you have a filesystem that understands hard links, deduplication is still required to find files that have the same content and link them together. You are possibly thinking of a filesystem that hashes contents to decide on storage locations.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
57. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-15 02:42 · Score: 0
  
  Fuck off, that's such a cop out.
  *Storage* costs are far cheaper than RAM. Instead of buying huge amounts of RAM, you could just not use dedupe and buy more storage.
58. Re:Can we get a real Linux filesystem, please? by T-Ranger · 2012-12-15 05:01 · Score: 1
  
  If you ... your employer ... are prepared to spend money, then why not spend money? I mean, and this is a serious question, why not go with something like a EMC VNX or VNXe? Byte for byte of real physical storage SANs are pretty expensive, I grant, but the features can oft make up for that.
59. Re:Can we get a real Linux filesystem, please? by TCM · 2012-12-15 05:16 · Score: 1
  
  What a load of BS. What if two files happen to have the same content, but shouldn't really be tied to each other?
  Two hardlinked files are forever stuck together until you unlink them manually, down to their file access times and everything. If I write to one, the other changes.
  Deduplication doesn't have this semantic tie. Two files happen to have the same content? Fine, save space. But write to one file and the other stays as it was. Plus you _still_ have hardlinks if you want to create a semantic connection.
  Not to even mention the fact that deduplication also works if only parts of the files are common.
  So please, think before you post.
  
  --
  Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
60. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-15 05:23 · Score: 0
  
  You say: "Copy-on-write leaves the original data unchanged" and VSS leaves the original data unchanged.
  the implementation details doesn't change the logical concept. COW says: "The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource" but it looks like you don't get it.
61. Re:Can we get a real Linux filesystem, please? by dna_(c)(tm)(r) · 2012-12-15 05:24 · Score: 2
  
  [...]I just got out of a meeting at my job [...]and because Linux has no stable filesystem with enterprise features [...]
  Sure, AC has some real complex stuff to handle on an enterprise level. That's why all the big boys like Google, Facebook and Twitter are using Windows to host their data...
  You're either a silly moron, a self deluding enterprisy [a-z]+architect or a very capable troll.
62. Re:Can we get a real Linux filesystem, please? by TCM · 2012-12-15 05:26 · Score: 1
  
  So the also non-existent data integrity is the reason they don't have deduplication? Why don't you just say "Yes, we don't have a real filesystem" instead of these laughable arguments?
  
  --
  Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
63. Re:Can we get a real Linux filesystem, please? by iggymanz · 2012-12-15 05:38 · Score: 1
  
  opensolaris is long dead. OpenIndiana has never put out a stable release and never met their 2011 q1 stable release target. they put out a development release once in a while, but that is NOT production grade nor matained at a level suitable for production use
64. Re:Can we get a real Linux filesystem, please? by thrift24 · 2012-12-15 06:22 · Score: 1
  
  Why would you spread a database over multiple Logical Volumes. That just sounds like a poorly engineered LVM setup. Am I wrong?
65. Re:Can we get a real Linux filesystem, please? by thrift24 · 2012-12-15 06:29 · Score: 1
  
  Linux absolutely has production level encryption through the device mapper and support for snapshots with LVM.
  
  Data deduplication is something I'm not as familiar with, but Microsoft just got support for this in Windows 2012 and Linux has had some dedup support for at least this long. I don't know how production ready either are, but I'm pretty sure I don't trust your accuracy on the matter after your previous claims.
66. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-15 07:09 · Score: 0
  
  Yea this is silly, virtually anything can saturate gigE nowadays. It's not been a challenge for years. Software raid ext3? Sure. Even $50 hardware RAID cards? Sure. Not exactly a challenge.
67. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-15 07:13 · Score: 0
  
  Stable for university mucking about, but hardly 'production ready' for anyone other than the guy with a few servers in a colo selling overpriced webdesign/hosting/techsupport.
68. Re:Can we get a real Linux filesystem, please? by Zero__Kelvin · 2012-12-15 08:22 · Score: 2
  
  "I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)"
  Somebody notify the millions of Enterprise servers that are Linux based, and serving up a major portion of the internet's content every day! Talk about throwing the baby out with the bathwater. Basically, you don't want to take a chance that established filesystems that have been in use in a corporate environment for well over a decade might fail someday, so you are considering going with an OS that is known to be unstable and requires regular reboots just to keep it's security "up to date" (which doesn't mean secure). Bravo! Way to totally foul up a basic system analysis!
  
  Somebody should invent RAID and regular backups!
  
  --
  Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
69. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-15 09:01 · Score: 1
  
  Dear Microsoft spinmods: you don't change the fact that your volume snapshots suck by modding down my post.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
70. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-15 09:05 · Score: 1
  
  You say: "Copy-on-write leaves the original data unchanged" and VSS leaves the original data unchanged.
  the implementation details doesn't change the logical concept. COW says: "The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource" but it looks like you don't get it.
  I did not say that VSS leaves the original data unchanged, I said the opposite. And this is not an "implementation detail", it's a fundamental property of the operation. And could you please read the next sentence after the one you quoted from Wikipedia, it invalidates your argument. And could you please stop chewing on my toes and learn something about computer science.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
71. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-15 09:10 · Score: 1
  
  Sounds like Btrfs envy. Question is, can they get to work reliably?
  Here is an informative post that details why Microsoft's Refs sucks and you don't need to care about it. Even if it works reliably, which is not at all assured (see many reports on the net of issues) this filesystem is pathetically feature poor. What's the point.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
72. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-15 10:32 · Score: 1
  
  Why would you spread a database over multiple Logical Volumes. That just sounds like a poorly engineered LVM setup. Am I wrong?
  The idea is to spread it over separate underlying disks or RAID sets. MSSQL and Exchange transaction logs are pretty much write only. The databases themselves are read/write, obviously, but still might be read-mostly or write-mostly. By putting them on separate array's you can optimize the caching, RAID type, and RAID stripe size in each array for its intended purpose. Even spreading different database tables over different arrays can help too depending on the usage patterns.
  Oracle have the similar recommendations for their database setups too.
  Even under a basic Linux setup with / in one lv, /var in another, and /home in another, the delay between snapshotting each one isn't desirable, although it is unlikely to have any real-world impact.
73. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-15 10:39 · Score: 1
  
  Dear Microsoft spinmods: you don't change the fact that your volume snapshots suck by modding down my post.
  Troll is a little harsh... I disagree with you but I know you're not trolling and the discussion is still an Interesting one.
74. Re:Can we get a real Linux filesystem, please? by jamesh · 2012-12-15 10:48 · Score: 1
  
  When you have a filesystem that understands hard links, deduplication is redundant.
  I would argue that maybe it doesn't belong in the filesystem in the first place. If you have a bunch of VM's all with (say) Debian Wheezy then deduplication in the backend storage would do much more than simple FS deduplication. Some FS knowledge in the storage would be useful (eg files with the same name in each FS are probably a good place to start to look for duplicates) but even that is just an optimisation and isn't required.
75. Re:Can we get a real Linux filesystem, please? by dbIII · 2012-12-15 11:04 · Score: 1
  
  By default it builds as a module from source and I don't think anybody is packaging it yet. It seems to use close to 4GB (which seems well over twice what ZFS on FreeBSD appears to be using) so I wouldn't recommend it on anything with less memory than that from what I've seen of it.
76. Re:Can we get a real Linux filesystem, please? by bzipitidoo · 2012-12-15 11:22 · Score: 1
  
  Last time I ran a benchmark, FAT was by far the slowest file system. Ext2, 3 and 4, Reiser 3 and 4, btrfs, xfs, jfs, and even ntfs were all much faster. Each varied on different kinds of loads, but the differences between them was insignificant next to the difference in speed between all of them and FAT. Simplicity often doesn't translate to speed. FAT does many things in brain dead ways. Let's rewrite the entire file for every tiny change, and do it right away, no caching! Insert a little something in the middle? Rewrite all the data that comes after it! And don't even try to at least defragment a little bit while doing so.
  
  --
  Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
77. Re:Can we get a real Linux filesystem, please? by drinkypoo · 2012-12-15 11:56 · Score: 1
  
  It seems to use close to 4GB (which seems well over twice what ZFS on FreeBSD appears to be using) so I wouldn't recommend it on anything with less memory than that from what I've seen of it.
  That's an awful lot for a filesystem. What does it use on slowlaris?
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
78. Re:Can we get a real Linux filesystem, please? by Lennie · 2012-12-15 12:17 · Score: 1
  
  It depends on your needs.
  Take for example the top500, if I'm not mistaken more than 50% of that uses Lustre as the filesystem. Which is obviously Linux based.
  I think both Ceph ("inspired" by Lustre) and btrfs are interresting and I'm sure they'll be more than production ready next year.
  Hopefully with bcache in the mainline kernel too.
  
  --
  New things are always on the horizon
79. Re:Can we get a real Linux filesystem, please? by dbIII · 2012-12-15 12:46 · Score: 1
  
  I'm not sure, my solaris boxes don't have a lot of storage so I haven't touched ZFS on solaris. I've got it running on two FreeBSD machines, one with a total of 2GB memory, and total memory usage rarely goes above 512MB (it went past that when I was moving a 350GB file) so it looks like just a sign that the linux version is still in it's early days. I'm moving the 4GB linux machine over to FreeBSD this week since all the memory slots are used.
  I haven't used it a lot, but so far FreeBSD with the ports collection looks to me a lot like what Gentoo linux was intended to become.
80. Re:Can we get a real Linux filesystem, please? by drinkypoo · 2012-12-15 13:28 · Score: 1
  
  I haven't used it a lot, but so far FreeBSD with the ports collection looks to me a lot like what Gentoo linux was intended to become.
  Maybe I'll look at it again for my next filer. Last time it seemed to be annoying for the sake of being annoying, which was also my impression of the FreeBSD users I knew personally, but that doesn't mean they're all like that. I've used netbsd and OpenBSD and even 4.3BSD-lite on ROMP but not FreeBSD.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
81. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-15 14:07 · Score: 1
  
  Which of course you can do that, but then you can't have the database LV and the log LV on different physical disks any more, which is what was asked. Can you post an example how you would concatenate two existing LVs, with existing file systems on them, mounted and being modified at the time. into a "new virtual block device" without even un-mounting them, and then make a consistent snapshot of them?
  You're delusional, "without even unmounting them" appeared nowhere in the discussion above, nor did the concept of making separate filesystems work together atomically. Your assertion about "different physical disks" doesn't make any sense at all. Of course you can combine different physical disks into a single logical volume. You would then create a single filesystem on the logical volume. Look here for examples.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
82. Re:Can we get a real Linux filesystem, please? by LordLimecat · 2012-12-15 14:19 · Score: 1
  
  If ext3 is showing as faster than FAT in your benchmarks, your benchmarks are horribly flawed. A non-journaling filesystem with real metadata is going to be oodles faster than any journaling filesystem.
  Heck, I wouldnt be suprised if NTFS were competitive with EXT3, ext3 isnt exactly known as a speed demon.
83. Re:Can we get a real Linux filesystem, please? by thrift24 · 2012-12-15 14:34 · Score: 1
  
  You can make a single LV that is striped across separate underlying disks or RAID sets. That's kind of half the point of LVM.
  
  If you really absolutely wanted a consistent snapshot of the whole fileystem you could just use one LV, although there are of course many good reasons not to do that, but I can't really see a need for /home and /var to be consistent with one another. If you want a DB snapshot, then just snapshot /var, unless your database product is crazy files shouldn't really be changing anywhere else.
84. Re:Can we get a real Linux filesystem, please? by donaldm · 2012-12-15 18:52 · Score: 1
  
  Actually ext4 is way faster than ext3 and has been out a few years now. I make all my file-systems ext4 including my backup disks and have never had any issues. The only thing I have FAT on is some flash drives that I sometimes use to transfer files to MS Windows machines and it is rare for me to go the other way since I normally don't have anything I want on MS Windows machines.
  
  I did try BtrFS about a year ago but for home use I found it not worth the effort (actually it is really easy) and I am very familiar with AdvFS and ZFS as well as many other types of file-systems. Oh well I guess I will wait for BtrFS to become more main stream.
  
  BTW the original AC post was very good troll since it was only praising Microsoft file-systems and seemed ignorant about other highly reliable enterprise file-systems such as ext3 which is being superseded by ext4 and JFS to name a few. It must also be noted that JFS is IBM's enterprise file-system that is also run on multimillion disk farms and computer systems and is open source which means it also runs on Linux and is fully supported by IBM.
  
  --
  There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
85. Re:Can we get a real Linux filesystem, please? by donaldm · 2012-12-15 19:17 · Score: 1
  
  I will second a very capable troll. :)
  
  --
  There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
86. Re:Can we get a real Linux filesystem, please? by Rich0 · 2012-12-16 01:17 · Score: 1
  
  btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.
  Well, that might be why they're working on btrfs, then. :) I'm not sure about encryption, but everything else on your list is something likely to be in the feature list at some point. It obviously isn't stable yet, but that is a matter of time, and if somebody wanted to make a push to get something stable they'd get there a lot faster with btrfs than reinventing something else.
  btrfs already supports reflink copies (think of a copy that behaves like a hard link on initial copy, but each file tracks its own separate changes, sharing file regions that have not changed). That isn't quite deduplication, but I imagine that somebody will get around to implementing that once things settle down (more important issues like not losing your data to work on right now). If you wanted to do a scan and de-duplicate pass that would be pretty easy, even implemented in userspace (just find files with common regions, delete one, reflink it from the other, and replay the modifications). Obviously directly manipulating the filesystem would be more efficient, and you'd want to build in some kind of index to avoid scans in the filesystem as well.
  Snapshots are already fully supported on btrfs - and can be done at the level of the root or any folder within the filesystem. A file-level snapshot is just a reflink. Snapshots are first-class citizens and can be mounted, resnapshotted, etc.
  Btrfs can also have quotas at any level of the filesystem, so that basically covers your lvm2 need. It can expand across multiple storage devices, with a few raid-like options supported now, and with more likely to come. You can also tag individual files and tell the system to store just that file with increased redundancy.
  Btrfs is basically the future of linux filesystems - I don't really hear anybody disputing that. It just isn't quite the present, hence ongoing efforts around ext4.
87. Re:Can we get a real Linux filesystem, please? by kasperd · 2012-12-16 01:39 · Score: 1
  
  A filesystem approach is the only way to ensure end-to-end data integrity
  Integrity checks in the file system certainly provides much better guarantees than integrity checks on the storage level. And anybody designing file systems today should build integrity checks into their file systems. But the higher a layer you move the integrity checks to, the closer you get to real end-to-end integrity. File system integrity checks don't protect data while it is sitting in memory.
  
  If you copy a file from one file system to another, it can still be corrupted in transit. Even if both source and destination file system have build in integrity checks, the copy could get corrupted in the process. But if the source and destination file systems both have integrity checks, they could provide some API to facilitate simpler integrity checks at the higher level. For example if both source and destination file system use a hash-tree for integrity checks, there could be an ioctl to retrieve the root of said hash-tree. Then the cp command could call this after copying and compare hashes of source and destination.
  
  --
  
  Do you care about the security of your wireless mouse?
88. Re:Can we get a real Linux filesystem, please? by petermgreen · 2012-12-16 02:53 · Score: 2
  
  Also, ZFS is an insane thing written by people who don't seem to understand that keeping a good separation of concerns can lead to a rather slick set of general tools that can be used on almost any fs.
  Separating stuff into layers has benefits but it also has costs. Sometimes merging layers can make things practical that aren't practical with them separate. Afaict this is what drove the creation of zfs and btrfs.
  Lets first look at RAID. traditional raid provides protection against reads that fail but not against reads that silently return wrong data. Experience has shown that hard drives cannot be trusted not to silently return wrong data. Worse still raid resyncs after power failure may silently overwrite good data with corrupt data. Adding checksums in the raid layer is difficult because there is nowhere good to put them (you can't just make the blocks slightly smaller because filesystems expect power of two block sizes). Putting checksums in the filesystem helps a bit but even if there is an API to request the "other copy" when the filesystem detects corrupt data the aforementioned resync may have already overwritten the good version with the bad one. By moving the responsibility for storing data redundantly into the filesystem we can avoid this problem, when going a consistency check the filesystem can check both copies against the checksum it keeps and ensure it overwrites the bad version with the good one rather than vice-versa
  Also traditional raid requires the whole array to have the same level of redundancy. It's possible to work around this by having multiple arrays but that then means you have to manually allocate space between the arrays. Yes there are ways to grow and shrink arrays but it's extra work and may involve downtime. With redundancy at the filesystem layer you should just be able to tell the filesystem what level of redundancy you want for each directory and let the free space be used for any of them.
  Now lets consider snapshots. Snapshots below the fileystem layer mean that you waste effort snapshotting free space. Worse still if writing to a snapshoted volume works by remapping blocks then it creates fragmentation in the mapping which is likely to stay around forever. This fragmentation happens even if the blocks were previously free (due to the fact we are snapshoting free space) and may stick around even after the file that caused it to happen is long gone. With snapshots at the filesystem level you don't snapshot free space and while you still get fragmentation you only get it when modifying an existing file (not when creating a new file) and it goes away when the file is deleted. Finally having snapshots at the filesystem layer means you don't have to snapshot the whole filesystem, you can snapshot individual directories within it.
  Now lets consider dedupe if you do it below the filesystem layer then to get much benefit from it you have to make your logical devices larger (in aggregate) than your physical devices. That is likely to lead to some very strange errors when you run out of physical blocks but the filesystems still think they have free space. It can also lead to the problem of "garbage" that the dedupe layer thinks needs to be preserved even though it's not actually in use by the filsystem (granted a trim-like API between the filesystem and the dedupe layer could fix this).
  With encyrption having encryption as part of the filesystem allows you to chose what you do and don't want encrypted without having to mess arround with seperate volumes and the administrative overhead threof (see previous comments about raid) though how useful this is and whether it is worth the increased risk (too easy to leave clues behind unencyrpted) depends hugely on your threat model.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
89. Re:Can we get a real Linux filesystem, please? by aix+tom · 2012-12-16 04:18 · Score: 1
  
  Sorry, but "LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation" actually IMPLIES the "without even unmounting them". It also implies the "different physical disks".
  If you don't know these basic concepts, then please stop with those "Of course it's possible" post, because it is not.
90. Re:Can we get a real Linux filesystem, please? by lsatenstein · 2012-12-16 05:34 · Score: 1
  
  btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.
  I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)
  This doesn't mean to suck Sun's teat for ZFS access... but at least try to come close to what even NTFS or even ReFS offers...
  ===
  What is the big complaint about btfrs. Is it the egg that it should be hatched at perfection? With the new kernel out this or next week, btfrs will gain major performance improvements. btfrs will surely be a desktop file system, until all security issues are resolved. The DOS attack is done by someone able to use the keyboard on your desktop or server. In my opinion the DOS is really an academic study. ZFS and even EXT4 will have some form of weakness. And NTFS too, if you are a windows server user.
  I've been using btfrs with Fedora 18 beta since November. I can pull the plug, and it recovers nicely. I have not tested all the wonderful features that come with it, but I will.
  ZFS looks interesting too. Am I stuck on one or the other? Benchmarks measuring speed recommend EXT4 as the best choice. I leave the evaluations and recommendations to you, the reader of my reply.
  
  --
  Leslie Satenstein Montreal Quebec Canada
91. Re:Can we get a real Linux filesystem, please? by stoatwblr · 2012-12-16 05:36 · Score: 1
  
  Try using native zfs instead of zfs-fuse - just make sure you have enough ram if you want to futz around with dedupication.
92. Re:Can we get a real Linux filesystem, please? by haruchai · 2012-12-16 05:51 · Score: 1
  
  Not at the enterprise level - we've added 96GB of ECC RAM to each of our chassis for about $1700 each.
  Adding 3TB to our SAN, having the disks validated, new LUNs provisioned, etc, cost over $20k.
  
  --
  Pain is merely failure leaving the body
93. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-16 05:57 · Score: 0
  
  Well, I understand that your meaning of unchanged data is different from mine and a lot of people, you look at it from a physical point of view. That's not data, it's only a representation, can you get it?. By the way, do not use Solid State Disks because the only the fact of reading it can be a trigger for the firmware to move it and by your way of thinking, "changing" the data.
94. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-16 06:07 · Score: 0
  
  I did not say that VSS leaves the original data unchanged, I said the opposite. And this is not an "implementation detail", it's a fundamental property of the operation. And could you please read the next sentence after the one you quoted from Wikipedia, it invalidates your argument. And could you please stop chewing on my toes and learn something about computer science.
  So, if you said the opposite then you said tha VSS changes the original data, don't you? which is false. Remember that data is independent of it's representation, maybe you have to review your computer science knowledge. Computer science is about logic, discrete mathematics if you prefer, not implementation details.
95. Re:Can we get a real Linux filesystem, please? by LordLimecat · 2012-12-16 07:51 · Score: 1
  
  I wasnt aware that ext3 was considered "highly reliable"; certainly it is a journaling filesystem, but I think ive seen a relatively (compared to number of systems seen) equal number of filesystem disasters on both. Ive seen chkdsk recover NTFS from some pretty bad states, and ive seen fsdisk fail spectacularly.
  Ext3 is "reliable" because it has been in service basically forever and because it is journaling, not for any other reason. AFAIK it doesnt have any super-advanced reliability features like checksumming or automatic snapshotting.
96. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-16 11:33 · Score: 1
  
  You're not even talking about LVM, you're talking about making multiple databases operate together atomically. In the context of LVM, that's complete nonsense. It's not an LVM concept, it's a high level application concept. Why are you even wasting bandwidth conflating these issues? If you want to do the job with LVM, you concatenate the volumes and run a single filesystem, or single database on the aggregate volume. Conflating this with the application level consistency somebody dragged into the discussion is just idiotic. By the way, you should keep a lid on the hubris about who knows what.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
97. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-16 12:32 · Score: 1
  
  That's not data, it's only a representation, can you get it?.
  Sorry, the person not getting things here is not me. Copy-on-write is a technique of avoiding changing shared data in place. Let's not get all abstract and confused about that, ok? I mean, you can wank on about your data abstractions, but in the process you will also wank away the computer science and miss the point entirely.
  The motivation for not changing the original, shared data in copy-on-write is, we might not know about all the previously existing references to that data, so there may be no practical way to find them all and change them. Microsoft uses a different technique in shadow copy, they do know about all the incoming references from previously snapshots, so they change them to point somewhere else, copy the original data there, then proceed to change the original data. That is not copy-on-write because it does not avoid changing the original, shared data, in place. (See, I spelled out the "in place" thing as a comprehension aid.)
  Two very different techniques, with very different performance characteristics. Copy-on-write is O(1) in number of incoming references, while copy-before-write is O(N) in number of incoming references. Copy-before-write requires stalling a write operation for the duration of the copy and metadata update, while copy-on-write does not. Fundamentally different algorithms, as is apparent because of the different complexity characterics. Irrespective of the computer science involved, Microsoft ignores the subtleties and calls the second thing copy-on-write anyway. Opening up plenty of opportunity for wanking on Slashdot by the likes of you. Now, if you want to be precise about it, use the term redirect-on-write every time we actually mean the classic computer science concept copy-on-write (which is documented reasonably well on Wikipedia, except in a certain paragraph containing the word "Microsoft").
  As for your "just a representation" argument, let's extend that. A sorted list is "just a representation" of an unsorted list, therefore sorting costs nothing. Oh wait, that's nonsense. Just as your "that's not data" argument is nonsense.
  Finally, I hope you agree that copy-before-write performance sucks pretty badly compared to classic copy-on-write (redirect-on-write). And by extension, Microsoft's shadow copy feature sucks badly. Confusing the issue by confusing the terminology does help Microsoft avoid criticism over poor performance. But I would not go so far as ascribing to malice what can be more easily explained by incompetence.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
98. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-16 12:36 · Score: 1
  
  you said tha VSS changes the original data, don't you? which is false
  It's not false, Microsoft clearly documents that they do exactly that: they change the original data in place after saving a copy of it somewhere else.
  As far as computer science goes, I feel more stupid after reading your post, I'll need to stop doing that now.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
99. Re:Can we get a real Linux filesystem, please? by smash · 2012-12-16 14:01 · Score: 1
  
  This is why you use ECC ram.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
100. Re:Can we get a real Linux filesystem, please? by MikeBabcock · 2012-12-16 16:15 · Score: 1
  
  In nearly every case where the files aren't semantically related and hard-linkable, deduplication is silly due to storage costs.
  That's already covered by many others' comments. Do your own thinking.
  
  --
  - Michael T. Babcock (Yes, I blog)
101. Re:Can we get a real Linux filesystem, please? by aix+tom · 2012-12-16 21:11 · Score: 1
  
  I'm NOT talking about multiple databases, I'm talking about the SINGLE database spread across multiple LVs, like Jamesh did.
  It was YOU that came up with the "OH, it's possible with LVMs" nonsense in a reply to him, not me.
102. Re:Can we get a real Linux filesystem, please? by kasperd · 2012-12-16 21:32 · Score: 1
  
  This is why you use ECC ram.
  ECC RAM does reduce the rate of such errors, but it does not eliminate them. I have seen undetected single bit errors on systems that were entirely using ECC RAM. I am not saying the errors happened in the RAM, it could have happened on the bus or even inside the CPU. Once you start handling multiple PB of data, such errors show up, even if you are using ECC RAM. Using good hardware helps, but no matter how good hardware you choose, you shouldn't trust it. You need end-to-end integrity at a higher level, that is how we noticed that undetected bit errors had been introduced by the hardware.
  
  The integrity checks at the lower level only protects data for a small part of the flow. With gaps between the part of the data flow protected by one checksum and the part protected by another checksum, there is a window for errors to be introduced. You can design for such low level integrity checks to overlap and thereby giving you something that is as good in detecting random corruption as an end-to-end integrity check. But it requires lots of understanding of how the hardware operates at the very lowest levels to design for such an overlap of of integrity checks. A design with a single end-to-end integrity check has much less risk of introducing windows for corruption through design flaws.
  
  --
  
  Do you care about the security of your wireless mouse?
103. Re:Can we get a real Linux filesystem, please? by TCM · 2012-12-17 00:04 · Score: 1
  
  In nearly every case where the files aren't semantically related and hard-linkable, deduplication is silly due to storage costs.
  What do you mean, storage costs?
  Cases where hardlinking is wrong but deduplication works: Mail servers. VM storage.
  
  --
  Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
104. Re:Can we get a real Linux filesystem, please? by Anonymous Coward · 2012-12-17 04:54 · Score: 0
  
  Remember why NTFS is preferred over FAT32 also, lager sectors, and better security enablements. Having to chunk up a large 8 gig file so it will fit to FAT32 or move it all at once to NTFS..
105. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-17 05:41 · Score: 1
  
  I'm NOT talking about multiple databases, I'm talking about the SINGLE database spread across multiple LVs, like Jamesh did.
  It was YOU that came up with the "OH, it's possible with LVMs" nonsense in a reply to him, not me.
  Gosh, you're hard to talk to, do people ever tell you that? If it is a single database then lvm volume concatenation of lvm volumes will work perfectly well. If it is multiple databases then your assertion immediately above is false. There is no third possibility, what are you going on about?
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
106. Re:Can we get a real Linux filesystem, please? by aix+tom · 2012-12-17 08:49 · Score: 1
  
  Well, you're not easy to talk to either. ;-) I'm talking about ONE database on MULTIPLE LVMs. (or, in the old days, on multiple RAIDs)
  For example, with Oracle you can put different tables OF THE SAME DATABASE. into different table spaces, on completely different storage devices. For example, put some not-often accessed tables OF THE SAME DATABASE on one LV with a few big big and not so fast SATA disks, and put other tables OF THE SAME DATABASE on a lot of smaller, faster, fibre channel disks on a different LV. And/Or separate data and redo/undo table spaces that way. And definetly put different Online Redo logs on different physical disks / LVs like recommended here.
  I don't blame any volume management that it's not possible to do consistent snapshots for those scenarios (in fact it's not possible with ANY volume management that I now of, though I don't now ZFS at all.), I just wanted to point out that it is indeed impossible do do consistent snapshots in that regard purely on the LVM level.
  With a "small" database, that can be put on one single LV I can just do a snapshot of the LV, copy the content, and start the DB on another machine without a hitch. With a "big" database that is spread over multipe LVs I have to put the Database into some sort of online backup mode, do snapshots of all LVs one after the other, copy the content, and then do a database recovery when I start up the copy to iron out the "discrepancies" between the snapshots.
107. Re:Can we get a real Linux filesystem, please? by Tough+Love · 2012-12-17 09:19 · Score: 1
  
  OK, I see where you're coming from. The Oracle database you describe depends only on write completion semantics of independent block devices, it does its own recovery, but if random volumes "jump back in time" due to replicating a snapshot, the database can't recover reliably. A well known issue. You can fix this with LVM, though I will not claim that this is elegant. Concatenate all the physical volumes together, then allocate separate logical volumes on top of that, that exactly match the underlying physical volumes. Run the database on those logical volumes. To create a consistent state of all the underlying volumes, pause the database and flush all the logical volumes. You now have a consistent set of physical volumes you can copy somewhere and recover the database from.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
CRC by RedHackTea · 2012-12-14 13:35 · Score: 1

My knowledge of file-systems is minimial. But since it's a CRC attack, can you just turn off the ability of Btrfs to check errors (if that's possible)? However, I'm sure data corruption would then ensue.

Anyway, I'm glad I always use ext4/3. I thought about trying ZFS at one point, but decided that using Solaris as a non-server OS is pointless. Does anyone still use Solaris?

--
The G
1. Re:CRC by Mike+Domanski · 2012-12-14 13:49 · Score: 1
  
  From the linked blog:
  
  Directories are indexed in two different ways. For filename lookup, there is an index comprised of keys:
  Directory Objectid | BTRFS_DIR_ITEM_KEY | 64 bit filename hash
  The default directory hash used is crc32c, although other hashes may be added later on. A flags field in the super block will indicate which hash is used for a given FS.
  Sounds like btrfs uses a CRC as a hash. I assume it's a performance optimization, but using CRC as a hash is insane.
2. Re:CRC by Nimey · 2012-12-14 13:53 · Score: 0
  
  It is insane, yes. md5 shouldn't be /that/ computationally intensive, and even though it's not secure enough for cryptography anymore it should still be good enough for this.
  
  --
  Hail Eris, full of mischief...
  
  E pluribus sanguinem
3. Re:CRC by Anonymous Coward · 2012-12-14 14:01 · Score: 0
  
  Does anyone still use Solaris?
  That's a brand of cooking oil, right?
4. Re:CRC by Anonymous Coward · 2012-12-14 14:20 · Score: 1
  
  For short messages like filenames, MD5 takes 70 times as long to compute as CRC... And since the published attacks on MD5 lets you create collisions pretty cheaply, you could still do the same attack.
  If anything you'd use a construct like SipHash, but SipHash requires a secret key and a 64-bit output isn't really collision resistant anyway.
5. Re:CRC by Anonymous Coward · 2012-12-14 14:21 · Score: 0
  
  My knowledge of file-systems is minimial. But since it's a CRC attack, can you just turn off the ability of Btrfs to check errors (if that's possible)? However, I'm sure data corruption would then ensue.
  Anyway, I'm glad I always use ext4/3. I thought about trying ZFS at one point, but decided that using Solaris as a non-server OS is pointless. Does anyone still use Solaris?
  Using Solaris 11 in a non-server role is not any worse an experience than say a Linux desktop from a couple years ago, or Debian... :P
  Operating systems should not be a popularity contest. Solaris isn't a perfect OS, but there is a lot worth learning from it, like "stable interface" doesn't have to mean "principal developer graduated from college and is no longer maintaining it."
6. Re:CRC by Tough+Love · 2012-12-14 14:46 · Score: 1
  
  a 64-bit output isn't really collision resistant anyway
  Plenty good enough for a hashed directory key, which doesn't need to be crypticographically secure, just to have good distribution and random results affected as much as possible by all input bits. The size of the output is not the dominant factor, the quality of the input mixing is.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
7. Re:CRC by Anonymous Coward · 2012-12-14 14:53 · Score: 0
  
  In a number of companies, they are scaling from old school iron (SPARC, POWER) to dense, x86 blades.
  Netbackup is a good example of this. If I use its media server deduplication, it requires one filesystem for the disk pool. So, being able to move disks in order to make that one filesystem as useful/big as possible is important. I did this under RedHat, but then found out that they required additional money for an officially supported XFS extension (yes, I could load in CentOS modules, but then I would have an unsupported production cluster.) In this case, Solaris and ZFS came in extremely handy.
  Of course, Windows Server 2012 and its disk pools are very useful, but Netbackup has not blessed that yet.
  I would also say, that even though I have multiple certs with Linux, it does not have anywhere near the production features of Solaris or AIX.
  For example, one LPAR I had was meant to be completely locked down due to handling incoming log traffic. With a few trustchk invocations, any attempts to replace binaries with unsigned ones will result in errors, and with any type of HIDS in place, will be detected as soon as one of the binaries is attempted to be executed. Heck, one can even lock down root so anything running as UID 0 just runs as a plain old user... and any system changes have to be done by shutting the LPAR down, booting from another root volume group, and doing them there.
8. Re:CRC by maxwell+demon · 2012-12-14 19:46 · Score: 2
  
  Or just use a RB tree instead of a linear list for hash collisions, then you get only O(log n) instead of O(n) worst case search performance.
  To quote Wikipedia:
  Instead of a list, one can use any other data structure that supports the required operations. For example, by using a self-balancing tree, the theoretical worst-case time of common hash table operations (insertion, deletion, lookup) can be brought down to O(log n) rather than O(n). However, this approach is only worth the trouble and extra memory cost if [...] one must guard against many entries hashed to the same slot (e.g.[...] in the case of web sites or other publicly accessible services, which are vulnerable to malicious key distributions in requests).
  While a file system is not generally publicly available (actually it may be, if e.g. used on an FTP server), it is still shared.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
9. Re:CRC by maxwell+demon · 2012-12-14 19:47 · Score: 1
  
  Does anyone still use Solaris?
  That's a brand of cooking oil, right?
  No, it's a novel by Stanislav Lem.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
10. Re:CRC by Anonymous Coward · 2012-12-14 23:37 · Score: 0
  
  This is a tree, I'm pretty sure, not a hash table as some seem to assume. (A B-tree, which is better suited to file systems than a red-black tree)
  They're storing a tree index of "file name hash code" -> "inode" mappings, for fast lookup of file names.
  Thing is, an index is only as good as your keys, so when the keys collide, you don't get a performance benefit anymore.
  I'm not sure why they're indexing the hash instead of the full filename; it could be to save space and for performance reasons, or it could that their B-tree implementation can't handle variable-length keys.
11. Re:CRC by MikeBabcock · 2012-12-15 00:18 · Score: 1
  
  There are much more efficient hashes than MD5 that would work as well for fewer clock cycles. http://cr.yp.to/hash127.html comes to mind.
  
  --
  - Michael T. Babcock (Yes, I blog)
12. Re:CRC by cpghost · 2012-12-15 00:42 · Score: 1
  
  I thought about trying ZFS at one point, but decided that using Solaris as a non-server OS is pointless. Does anyone still use Solaris?
  Have you thought about using ZFS on FreeBSD? Running FreeBSD/amd64 here on a desktop machine with ZFS file systems without any problems.
  
  --
  cpghost at Cordula's Web.
13. Re:CRC by Wolfrider · 2012-12-15 10:18 · Score: 1
  
  --Yah, the movie version with George Clooney had a pretty hot chick in it, too... :D
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
14. Re:CRC by maxwell+demon · 2012-12-15 21:45 · Score: 1
  
  So it's effectively a hash table where the hash is stored in a B-tree instead of a being used as array index. It still has the base characteristics of a hash table, though:
  Step 1: You calculate a hash from your key.
  Step 2: You map that hash key to a container (in a standard hash table: Use it as index; in btrfs: look it up in a B-tree)
  Step 3: You seek your actual item in that container (in the usual hash table, and apparently also in btrfs: A linear list; to protect against malicious attacs: a balanced tree).
  What you described is how step 2 differs in btrfs from the usual hash table. My comment was about how to do better on step 3. And yes, a B-tree would be a better option to an RB-tree there. When I wrote "RB tree" I actually meant "balanced tree".
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
Requires local access by Anonymous Coward · 2012-12-14 13:48 · Score: 5, Funny

no more dangerous than a fork bomb or filling up /tmp or trying to compile open office.
1. Re:Requires local access by cryptizard · 2012-12-14 14:38 · Score: 5, Informative
  
  Sort of, but at least you can recover from those attacks by restarting or booting from an external source to clean up your filesystem. The second attack here leaves you with undeletable files because the file system code responsible for deleting cannot handle the multiple hash collisions. There is no way to recover from that until a patch is pushed out that fixes the problem.
2. Re:Requires local access by blade8086 · 2012-12-14 16:03 · Score: 2
  
  Which, without the over sensationalized BS that is this story, will probably be in about a week tops.
  And since BTRFS is not in any 'enterprise' Linux Distributions, means that it will pretty much be available
  immediately since everyone running it in critical production environments will probably be running
  pretty bleeding edge linuxen
3. Re:Requires local access by Anonymous Coward · 2012-12-14 17:27 · Score: 1
  
  Requires local access
  Well, it requires the ability to create named files. That could happen through a Wiki upload page, by extraction of an archive to a temporary folder for processing, etc.
  And unlike filling up /tmp, this will not be stopped by setting a quota.
4. Re:Requires local access by someones · 2012-12-15 00:30 · Score: 1
  
  this will be easily stopped by adding a filename prefix or suffix. There goes this script kiddie's while about experimental software not being perfect.
5. Re:Requires local access by ArsenneLupin · 2012-12-15 00:52 · Score: 1
  
  Well, it requires the ability to create named files. That could happen through a Wiki upload page, by extraction of an archive to a temporary folder for processing, etc.
  ... or worse, web caches which preserve original file names...
6. Re:Requires local access by drinkypoo · 2012-12-15 02:21 · Score: 1
  
  The second attack here leaves you with undeletable files because the file system code responsible for deleting cannot handle the multiple hash collisions. There is no way to recover from that until a patch is pushed out that fixes the problem.
  There's no filesystem debugger for btrfs?
  Seems to me like fsck ought to be able to solve this problem, too. Two files with the same hash? Delete the one with the newer timestamp.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
7. Re:Requires local access by GrievousMistake · 2012-12-15 02:31 · Score: 1
  
  this will be easily stopped by adding a filename prefix or suffix
  No it won't. It is still easy to make collisions with a known prefix or suffix. You would have to include a random component.
  Even if that was a feasible workaround, it's hardly a common best practice, nor should it be.
  
  There goes this script kiddie's
  He discovered this vulnerability himself, and wrote the attack code; he is by definition not a script kiddie. Never mind that he's a professor and published cryptographer.
  
  while about experimental software not being perfect.
  This has nothing to do with being experimental software. This is not a bug, it is a weakness in the design. Furthermore, the bad behaviour will not manifest by accident - you have to deliberately provoke it.
  This is the type of problem that isn't fixed before someone finds and reports it -- like Junod did.
  Please cease your inane babbling.
  
  --
  In a fair world, refrigerators would make electricity.
8. Re:Requires local access by cryptizard · 2012-12-15 02:58 · Score: 2
  
  Two files with the same hash is not a problem, it is allowed. This will happen just by chance many times on your filesystem because the hash is relatively short (64 bits). The problem is when you engineer many files to have the same hash and your data structure (hash table) degrades to an array. There is also some other problem in the code here that makes it so the the hash table can't store or for some reason can't process more than a certain number of collisions.
9. Re:Requires local access by Anonymous Coward · 2012-12-27 07:34 · Score: 0
  
  Oracle and SUSE support btrfs, with the latter only supporting btrfs, not ext4.
Nice! by gweihir · 2012-12-14 14:21 · Score: 3, Interesting

"Algorithmic Complexity Attacks" like this one have long been known, but rarely been documented publicly. One good example to point out why hash-randomization is a good idea!

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
1. Re:Nice! by blade8086 · 2012-12-14 16:08 · Score: 0
  
  Yah dude, you're so totally spot on - noone at all documents this!
2. Re:Nice! by Anonymous Coward · 2012-12-14 16:31 · Score: 3, Funny
  
  Words, they mean nothing! Take 'rarely' for example, who gives a shit, I'll read it as 'never' same thing.
3. Re:Nice! by Anonymous Coward · 2012-12-14 22:14 · Score: 0
  
  The claim was rarely; the response gave evidence this wasn't true.
  The response used the word "no one", but that was obvious sarcasm.
  So, what's your nitpicking semantic complaint again?
4. Re:Nice! by gweihir · 2012-12-16 10:19 · Score: 1
  
  Well, nice. An example for somebody completely missing the point! This is not about cryptographic hash collisions at all, they are a completely different problem. This is about hash-tables, a data-structure.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
This just in... by Anonymous Coward · 2012-12-14 14:27 · Score: 0

Letting an asshole have write access to your filesystem can lead to a fucked up filesystem.
1. Re:This just in... by FranTaylor · 2012-12-14 16:29 · Score: 1
  
  Are you saying that google's file systems are corrupt?
2. Re:This just in... by someones · 2012-12-15 00:26 · Score: 1
  
  do you have write access to their filesystem?
  Or do you just have write access to some database where you can tah the data with a "filename"?
Nice this was found before BTRFS goes stable by Anonymous Coward · 2012-12-14 14:31 · Score: 5, Insightful

Hopefully more people start fuzzing btrfs so it is that much better when it is declared stable.
1. Re:Nice this was found before BTRFS goes stable by Anonymous Coward · 2012-12-14 16:31 · Score: 0
  
  ...It's shipping with OpenSUSE right now...as the default.
2. Re:Nice this was found before BTRFS goes stable by Rich0 · 2012-12-16 01:27 · Score: 1
  
  Lots of people have been doing testing on btrfs. Filesystems aren't so much declared as stable as they become used as stable. Unless the fix changes the on-disk format in some non-backwards-compatible way, it doesn't really matter when the fix gets deployed. Most likely the fixes will be in git in a week or two.
  Oh, and anybody who really wants to run btrfs should probably be running the git version anyway. They're doing so many bugfixes per month that this is one of those rare times where the mainline kernel sources are likely to be in much worse shape. Once things settle down that will obviously change.
3. Re:Nice this was found before BTRFS goes stable by Rich0 · 2012-12-16 01:29 · Score: 1
  
  Just one of those issues with running an open-source OS published by the vendor of a proprietary OS. OpenSUSE and Fedora tend to be treated like guinea pigs.
  And this isn't necessarily a bad thing. If you're a RHEL shop then you probably want to have some Fedora test systems to get a sense for how your applications will operate in future versions.
  If you want something free and stable, you run something like Debian or CentOS, or whatever.
Who cares? by UltraZelda64 · 2012-12-14 15:01 · Score: 1

Unstable software that is still under heavy development is actually unstable. Who would've guessed?
I think that based on this ingenious discovery, we should all switch over to it by next week.
Fix the title by Anonymous Coward · 2012-12-14 15:05 · Score: 0

So the FS is vulnerable to an attack. The attack is not in the FS. That's pretty misleading.
Can I install btrfs on windows? by Anonymous Coward · 2012-12-14 15:46 · Score: 0

I mean, just to keep up. You know, with that totally great POS you guys think you know how to use. Luser=Linux user
1. Re:Can I install btrfs on windows? by Anonymous Coward · 2012-12-14 16:07 · Score: 1
  
  Yeah, I'll send you the installer. What's your e-mail address?
2. Re:Can I install btrfs on windows? by FranTaylor · 2012-12-14 16:30 · Score: 0
  
  How's that windows 8 UI experience coming along?
Good god man by tomp · 2012-12-14 15:54 · Score: 2

"Denial-of-Service Attack Found In Btrfs File-System" didn't happen. A vulnerability was found. That's a big deal, no reason to obscure it.
1. Re:Good god man by blade8086 · 2012-12-14 16:13 · Score: 1
  
  No, actually, this is NEITHER a DOS Attack, nor a vulnerability. It is a *bug*
  But oh so much better to douschebag promote yourself by being the super terducken 31337 hax0r sekuritah expert
  by mislabeling it and having it get picked up by the tech press.
Attack? by Decameron81 · 2012-12-14 16:18 · Score: 2

An attack was found in the filesystem? What's that supposed to mean?

--
diegoT
1. Re:Attack? by dr2chase · 2012-12-14 17:27 · Score: 1
  
  Carefully chosen file names (a lot of them) can DOS file system performance. Whether this could be escalated to a network vulnerability, hard to say -- if an attacker over the net can figure out a way to induce particular file names on the server, that would be worse.
  It's a little sad that people are still forgetting about this failure mode of hash tables and hash functions; either there's got to be a randomizing secret swizzled in, or a better (more nearly cryptographically strong) hash function, or both.
2. Re:Attack? by Anonymous Coward · 2012-12-14 17:45 · Score: 0
  
  ...or you could use trees, which don't have these problems at all.
3. Re:Attack? by Anonymous Coward · 2012-12-14 17:52 · Score: 0
  
  So ... a vulnerability was found.
  VULNERABILITY. ATTACK. Different words. Different meanings.
4. Re:Attack? by dr2chase · 2012-12-14 17:53 · Score: 1
  
  True, but good random numbers (good hashes) have interesting and powerful statistical properties.
5. Re:Attack? by Anonymous Coward · 2012-12-14 17:56 · Score: 0
  
  bash> touch "An attack"
6. Re:Attack? by Anonymous Coward · 2012-12-14 19:13 · Score: 0
  
  mmm. Well, it seems to me -- and I'm just your average programmer -- that trees have only one downside, and that is, they cost a little more space. But in these days of ridiculously easy to obtain memory, I just don't think that matters for most problems.
  Whereas hashes are a hack where the primary advantage is that they can map a large, but sparsely populated, problem space into a small one, until they can't, and then they begin to suck, and they may generate huge quantities of suck, depending on just how the small end of the map is being insulted.
  I don't see (but feel free to enlighten me, I mean it when I say I'm just an average programmer) how hashes having interesting and powerful statistical properties makes amends for the fact that they can suck really bad, when compared against a technique (trees) that... doesn't suck. :)
7. Re:Attack? by dr2chase · 2012-12-14 19:29 · Score: 1
  
  Read about universal hash functions (the writeup on wikipedia is not that bad). They're not a hack.
  You don't necessarily use a small space, either -- a 64-bit hash is not normally regarded as a small space, thought it is often smaller than the bit size of what is hashed into it.
  Two problems with trees are that you need to define a comparison (you can often concoct one, but they're not always given to you) and though memory is cheap, *probes* into memory are not. If a hash function can get you there in 1 step with high probability, that's interesting.
8. Re:Attack? by maxwell+demon · 2012-12-14 19:56 · Score: 1
  
  The two approaches are not mutually exclusive. A hash is an array of containers. Usually people use linear lists as containers because it's the simplest, and hash collisions are considered rare so the O(n) characteristics shouldn't matter. But when hash collisions may be intentionally caused, it's obvious that you should use a container more suited to your problem. Just think about what container you'd use if you weren't able to use a hash table, and then use that same container for the hash table array entries.
  Or in short, make your hash table an array of balanced trees instead of linked lists. That way you get O(1) typical behaviour (assuming a good hash function) and O(log n) worst case (which includes malicious attacks).
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
9. Re:Attack? by Noughmad · 2012-12-14 20:11 · Score: 1
  
  An attack was found in the filesystem? What's that supposed to mean?
  I'm not sure, but it sure sounds like Mr. Reiser had something to do with it.
  
  --
  PlusFive Slashdot reader for Android. Can post comments.
10. Re:Attack? by Anonymous Coward · 2012-12-14 20:46 · Score: 0
  
  Just to be clear, I didn't mean hack in the pejorative sense (I'm old) I meant it in the "holy shit, that's clever" sense WRT "mangle, hit target."
  I also meant larger to smaller space. Not small in the sense of, well, small. Sorry.
  
  If a hash function can get you there in 1 step with high probability, that's interesting.
  Yes, but "1 step" means several machine ops, typically. Hashing's 1-step also has to be followed by a full test, because otherwise, for all you know you're looking at a collision -- even if the probability is very high, you still must test. Following a tree can also get you there in one step, with the same terminal compare, presuming the leaf is on the trunk. If it's not, locate time grows in very small steps without in any way impacting anything else's locate time. A probe into a tree is a series of indexing operations, no more, until the leaf is located or the twig is bare. That is really efficient. Each probe can be done with just a few machine instructions even with an old-school processor. A modern one might be able to do it faster, I actually don't know what the instruction sets really look like any more, but I can sure tell you that old-school basic indexing ops are pretty much 1:1 with tree traversal requirements. Furthermore, the trees don't tend to be deep; and when they are deep-ish, they don't tend to be wide on the final twig. A per-directory tree design is amazingly solid and won't fall prey to similar-ish filename problems (or really, any other kind of problem I can think of at the moment.)
  I've written assemblers that used hash tables to map from a large but known mostly empty space to a small space... just to get the speed. I just screwed around until the hash gave me unique results for every mnemonic in progressively smaller tables, and then optimized and used that hash. So I certainly have uses for them. But in a filesystem namespace, hashes look to me like they spend more time making a mess than they do helping you out, and that this could happen at any time, volume nearly empty or volume full, etc. whereas a tree-based, per-directory filesystem namespace comparison is pretty straightforward, even with wide characters; nothing much to concoct there.
11. Re:Attack? by Anonymous Coward · 2012-12-14 20:50 · Score: 0
  
  Worst case for a tree is O(n), not O(log n). A 6 branch tree traversal takes, on average, 6x as long as a 1 branch search.
12. Re:Attack? by maxwell+demon · 2012-12-14 21:31 · Score: 1
  
  Worst case for a tree is O(n), not O(log n).
  What exactly did you not understand about balanced tree?
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
13. Re:Attack? by Anonymous Coward · 2012-12-15 03:30 · Score: 0
  
  Binary trees are complex and require re-balancing to maintain high search performance.
  Hashes are less complex generally and generally don't require re-balancing to maintain performance. Hash tables are also faster, but can consume more memory/space depending on the fill factor. However, you have to choose a good hash function and also handle hash collisions sanely. Hashes, in a generally well implemented case, have a faster complexity function then trees or other data structures.
  Best case binary tree performance is O(log n), which means, as the number of items you are searching in gets larger, performance drops by the log of n. Of course, an unbalanced binary tree will have worse performance then this, approaching O(n) in the extreme worst case.
  Best case hash table performance is O(1), which means, performance does not change regardless of your dataset size. (as long as your hash function does not duplicate and you can store the hash table so all parts of it have equal access time). As with binary trees, a badly implemented hash will result in performance approaching O(n) in the worst case.
  For those that don't know O(n) is equivalent to a dumb linear search, which looks through all items 1 by 1 until the desired item is found. O(n) is fine for small data sets, but really hurts search performance as your data set grows.
14. Re:Attack? by Anonymous Coward · 2012-12-15 07:25 · Score: 0
  
  Perhaps everything. [shug]
  I use a tree lookup mechanism that builds an extremely efficient compare of all the relevant leading portions of the query into the extremely efficient travel to the leaf or nub and when it gets there, it's done. I am under the very strong impression that it's faster than any other tree traversal mechanism, and consequently have not had any reason to look for variations on the theme. My trees are not "balanced" in any sense of the word I'm familiar with, the structure is 100% related to the data in them without any purely structural bias, and the tree structure perfectly represents the data without any bias.
  So perhaps you've got something there of great fabulousness. Or perhaps you're just talking about an unnecessarily complicated, inherently inferior design I wouldn't bother with in the context of today's abundant memory. I dunno. Wasn't in need of a change; was just pointing out that trees beat hashes because hashes, lovely as they may be at times, are likely to break horribly under unpredictable loads, which is what a filesystem namespace consists of.
15. Re:Attack? by Anonymous Coward · 2012-12-15 07:59 · Score: 0
  
  Binary trees are complex and require re-balancing to maintain high search performance.
  Not talking about binary trees. Talking about full trees, which don't require rebalancing or any other kind of balancing. They're not complex. They just work, don't require any heavy engagement to extend in any direction. And they're just about optimally fast for datasets where you have no idea what is going to end up in there -- they never slow down past a certain point (which is still very fast indeed) based on the dataset. They are certainly memory heavy, but as I said above, that's no longer a concern for most applications.
  
  For those that don't know O(n) is equivalent to a dumb linear search, which looks through all items 1 by 1 until the desired item is found. O(n) is fine for small data sets, but really hurts search performance as your data set grows.
  Thanks. Not a math-head at all, sadly. By O(n), I was thinking n=length of search term. A ten character filename could take ten very fast steps to locate or install as a leaf. No more, ever, and usually a lot less. I apologize for trying to use a notation I didn't understand.
  So my tree mechanism isn't O(n), as search time increases linearly (and minimally) with the length of the search term, there's no rebalancing or re-anything, and the whole process is completely insensitive to the size of the dataset.
  As an example, if you were storing ten-character filenames in an assembled tree, and it took three machine instructions to locate a node or leaf on a branch (that's about right), it'd take a max of thirty machine instructions to locate a ten-character symbol, or identify a node where a new leaf should be placed. It'd never be slower than that, and it would only be that slow if the space around it contained symbols that were identical except for the last character.
  I don't know how to formally characterize that in terms of O(whatever), but I do know that it's really difficult to beat in overall speed, and never misbehaves under any kind of namespace load. :) A hash that doesn't happen to collide is about as fast; as I mentioned above, you might hit your hash first time out, but you still have to compare what you found to what you're looking for to be sure that's actually what happened -- because hashing allows for it to not happen. Traversing the tree is basically subsumed into that compare time, so it's very similar. Somewhere around O(1), then?
  In my assembler designs, I controlled the symbol-space and was able to pre-arrange for collisions to be impossible, so the hash either hit a valid target or represented error input - just two ways to go after hashing. But you can't do that when the symbol-space is built from any/every valid combination of characters -- that is almost guaranteed to act badly. Putting a tree to work there (not a binary tree, just a full tree) instead makes it all work smoothly.
16. Re:Attack? by petermgreen · 2012-12-16 03:28 · Score: 1
  
  A simple binary tree has similar problems to a simple hash table. Namely that by controlling the items that are added to the strucuture it's possible to effectively turn it into a list (in a hash table you do it by putting everything in one bucket, in a binary tree you do it by making sure that each node you add ends up as a child of the previous node you added).
  Of course there are countermeasures against this attack on binary trees just like there are countermeasures against attacks on hash tables.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
17. Re:Attack? by Decameron81 · 2012-12-21 08:00 · Score: 1
  
  So ... a vulnerability was found.
  VULNERABILITY. ATTACK. Different words. Different meanings.
  Exactly! That's what I meant with my question, although I think it went unnoticed for some. You just dont find an attack!
  
  --
  diegoT
No by ArchieBunker · 2012-12-14 16:31 · Score: 2, Interesting

Instead of picking a filesystem and moving forward people will moan and cry and eventually split into a few different groups with beta level implementations. Sound on Linux is a great example. Two completely different sound drivers that both work half assed. What's the word with XFS these days?

--
Only the State obtains its revenue by coercion. - Murray Rothbard
1. Re:No by drinkypoo · 2012-12-15 02:24 · Score: 1
  
  What's the word with XFS these days?
  I don't know, but my last word is that I dropped it due to data corruption and now I'm using ext4 while I'm waiting for btrfs.
  I was hoping to be using bcache by now too, but alas, no. I have an 80GB SSD and a 320GB HDD, which I will bump up to 2x1TB stripe and backup to 2TB external... just as soon as I can install with bcache without having to do it all manually.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
2. Re:No by diegocg · 2012-12-15 03:35 · Score: 1
  
  What's the word with XFS these days?
  http://www.youtube.com/watch?v=FegjLbCnoBw
3. Re:No by Wolfrider · 2012-12-15 10:15 · Score: 1
  
  --Have you tried JFS? I'm a heavy Vmware user and it works really well, with minimal CPU usage.
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Attack vector by aNonnyMouseCowered · 2012-12-14 18:30 · Score: 1

Indeed, the title makes you think that BTRFS was trojaned or worse is malware.
So a script kiddie found by someones · 2012-12-15 00:23 · Score: 1

So a script kiddie found a vulnerability on an experimental filesystem.
There are warnings not to use btrfs in a stable envioronment EVERYWHERE, as its in development and pretty buggy.
But its the amazing feature set that btrfs offers even if its still pretty broken and you cannot rely on it without doing daily backups... ... and all this script kiddie is concerned about are colliding hashes on a shared envioronment, what is pretty uncertain, as this will not happen naturally?
A btrfs filesystem becomming corrupt because it fills up would be something to care about at this time.
1. Re:So a script kiddie found by iggymanz · 2012-12-15 03:50 · Score: 1
  
  actually most client/server file systems can be DOS'd by too many requests.....local access generally implies the ablility to clog things up
Re:Dedupe doesn't belong in a filesystem by LWATCDR · 2012-12-15 02:20 · Score: 3, Informative

You then turn it off.... And go take your meds.
I do not think you know what DeDup means. You as a user still see two copies of the file. If you make changes to one copy of the file it will only change that copy of the file. It is not like a link. In other words it is totally transparent to the end user but saves drive space. So if you work in a large organization and someone sends out an email to all 4000 people that email will only take up the space of one email. Even if everyone saves it the imap server.
In other words you do not know what you are talking about, you probably do not need these functions because you probably do not run a server or servers for a large organization, you seem to have some anger issues, and maybe just a little nuts.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Can we get a real editor? by Anonymous Coward · 2012-12-15 02:41 · Score: 2, Insightful

Editors please! I normally expect even a submitter to know the difference between an attack and a vulnerability. However the editor damn well better know the difference. When I read that an ATTACK had been found in btrfs I went to read about how some malicious code had been placed into the code for btrfs. Maybe this code modified data, erases stuff, sends data to China, or just renames files. But no, this was a simple vulnerability. They didn't find an attack in btrfs, they found the potential for an attack - which is called a vulnerability. Let's at least make an effort here.
1. Re:Can we get a real editor? by Nimey · 2012-12-15 04:45 · Score: 2
  
  ed(1) is the standard text editor.
  
  --
  Hail Eris, full of mischief...
  
  E pluribus sanguinem
Enterprise architect here by iggymanz · 2012-12-15 05:48 · Score: 1

Deduplication typically isn't done by the operating system in production systems, it is a feature of enterprise grade storage, backup and archival systems.
Snapshots and encryption can be done in GNU/Linux, or done outside the OS.
What enterprise grade storage/backup/archival systems are you using, the obvious solution will already be evident from that answer in most cases.
I'm no expert by Anonymous Coward · 2012-12-15 07:05 · Score: 0

But isn't TRWTF that you can use CRC32C collisions to attack a system still in development in 2012? I mean, I opened the link wondering if the butterfs is stupid enough to be using MD5 or something and the attack in on CRC32C - what sick joke is that?
It's not a problem about BtrFS at all by Anonymous Coward · 2012-12-15 07:24 · Score: 0

It's just a bug of bash's dealing with special characters.
When there are some special characters , this problem can be reproduced , whichever what filesystem you use.(actually I use EXT4 to reproduce the problem)
Details on my blog : http://wronganswer.tk/?p=272178
Re:Another crazy white guy goes on a rampage by Zero__Kelvin · 2012-12-15 08:15 · Score: 2

It is stupid to make this racial, but since you did, when was the last time a black guy opened up on a group of innocent school children?

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Epic Fail (The joke's on you) by Zero__Kelvin · 2012-12-15 11:23 · Score: 2

A good joke requires significantly planning.

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Vulnerability if you already have access .... by JasterBobaMereel · 2012-12-19 01:12 · Score: 1

So if you get local access to a system running a btfs filsystem then you can destroy it ...but if you have local access you can easily do that anyway with any filesystem ....?

--
Puteulanus fenestra mortis