Optimizing Linux Systems For Solid State Disks
tytso writes "I've recently started exploring ways of configuring Solid State Disks (SSDs) so they work most efficiently in Linux. In particular, Intel's new 80GB X25-M, which has fallen down to a street price of around $400 and thus within my toy budget. It turns out that the Linux Storage Stack isn't set up well to align partitions and filesystems for use with SSD's, RAID systems, and 4k sector disks. There are also some interesting configuration and tuning that we need to do to avoid potential fragmentation problems with the current generation of Intel SSDs. I've figured out ways of addressing some of these issues, but it's clear that more work is needed to make this easy for mere mortals to efficiently use next generation storage devices with Linux."
I really don't care about performance, when they retail for $400. Talk to me when I can get an 80GB one for under $50.
I think the bigger challenge will be in getting mere mortals to have a $400 toy budget to afford the SSD
Your government is working towards it.
Yes, we do need progress in that area. However, for many of us who require better-than-average data security, the matter of SSD's read/write behaviour makes the devices extremely vulnerable to analyses and discovery of data the owner/author of which believes to be inaccessible to others: 'secure wiping', or lack thereof, is the issue. As i understand it, 'secure wiping' programs fail to do their job, on SSD's . It's been reported among 'criminals' that SSD's are a 'forensic analyst's dream come true' ! and so it must be for corporate spies, etc,, who have a yen for theft of private data.
I know right? Send some cheddar my way Mr. Gates.
I for one hope he is successful so that when SSDs become more affordable, or even the default, Linux will be nicely optimized.
This article makes me wonder if any OS is really properly optimized for SSDs. Has there been any analysis as to whether or not windows machines properly optimize the use of solid state disks? Perhaps the problem goes beyond just linux?
The Matrix is real... but I'm only visiting!
If I mount /home on a separate drive, (good to do when upgrading) the rest of the Linux file system fits nicely on a small SSD.
My rights don't need management.
From economics, lets turn our attention to optimizing this toy of ours. The thing with SSDs is that they don't have a read/write head to worry about. This means that no matter where the data is stored in the device, all we need to do is specify the fetch location and the logic circuits select that block to extract the data from desired location. From what I've heard, the SSDs have an algorithm to actually assign different blocks to store the data so that the memory cells in a single locations aren't overused.
Face your daemons!
Yes, that's true. But the important thing is ensuring that the OS/filesystem breaks the data up into appropriate sized chunks that match up with the block size that the disk controller uses. This has nothing to do with fragmentation.
Why not use it as a 'permanent' ram. I'll be more than happy with only an enormous Hashmap on it. Just an easy api to handle it.
Forget about using it as a disk, it's not.
Most of us can't afford to worry about this, but does the Fusion-io suffer from this issue?
The cost of that cleanup, of course, will be borne by taxpayers, not industry.
Surely it's not the block size. I know nothing about filesystems beyond basics. Windows could specify the block size to be used. I assumed that Linux did the same? I have no idea about OS X either.
Are there standard block sizes in use for Linux and OS X filesystems? Can they be modified when they are formatted? If so, and the issue really is due to blocksize and fragmentation as a result, this would seem like an easy fix. Linux and OS X already resist fragmentation. I won't speak to MS's efforts there as they state NTFS does, but the implementation seems to be very different in the real world.
Some of you FS guru's fill us in here. How hard is it to implement something like variable block sizes, or to allow you to specify block size at format time?
I've been doing this for years with CF cards.
Put the volatile stuff on a spindle, the rest on a CF card.
> Vista has already started working around this problem, since it uses a default partitioning geometry of 240 heads and 63 sectors/track. This results in a cylinder boundary which is divisible by 8, and so the partitions (with the exception of the first, which is still misaligned unless you play some additional tricks) are 4k aligned. So this is one place where Vista is ahead of Linuxâ¦.
Although the technology it is used in is repugnant, NTFS has always been the One True Filesystem. It descended from DIGITAL's ODS2 (On Disk Structure 2) which traces back to the original Five Models (PDP 1, 8, 10, 11 and 12). You see, ODS was written by passionate people with degrees and rich personal lives in Massachusetts who sang and danced before the fall of humanity to the indignant Gates series who assimilated their young wherever possible and worked them into early graves during his epic battle with the Steves before the UNIX enemy remerged after a 25 year sleep and nuked the United States, draining all of its technological secrets to the other side of the world. Gates, realizing what he's done, now travels the universe seeking to rebuild his legacy by purifying humanity while the Steve series attempts to rebuild itself. Some of the original Five are still around, left to logon to Slashdot and witness what's left of the shadow of humanity still in the game as they struggle blindly around in epic circles indulging new and different ways to steal music, art and technology to make up for their lack of creativity long ago bred out of them by the Gates series.
SSDs gradually gain more and more sophisticated controllers which do more and more to try to make the SSD seem like an ordinary hard drive, but at the end of the day the differences are great enough that they can't all be plastered over that way (the fragmentation/long term use problems the story linked to are a good example). I know that (at present- this could and should be fixed) making these things run on a regular hard drive interface and tolerate being used with a regular FS is important for Windows compatibility, but it seems like a lot of cost could be avoided and a lot of performance gained by having a more direct flash interface and using flash-specific filesystems like UBIFS, YAFFS2, or LogFS. I have to wonder why vendors aren't pursuing that path.
This means that no matter where the data is stored in the device, all we need to do is specify the fetch location and the logic circuits select that block to extract the data from desired location.
Which is why you don't need head-optimized I/O schedulers like Anticipatory, which waits a couple of ms after every read to see if there's more from that area, thus saving on seek times.
SSD's must be optimized differently. For instance, they can't write arbitrary small pieces of data, only whole blocks. Thus, if you want to optimize it, you'd better make sure to write whole blocks at a time if possible, and not have small files cross boundaries if they don't have to.
I've been wrestling this idea around as a sound studio solution, and it seems that an external storage unit makes the most sense, with a DRAM card for the currently working files. Almost affordable, anyway.
The cost of that cleanup, of course, will be borne by taxpayers, not industry.
Every mass storage device since cassette tapes read/writes a whole block at a time.
I have mod points, but cannot find the "Totally Bonkers" mod...
Generally, bash is superior to python in those environments where python is not installed.
Don't do drugs, man.
Partition the drive into BlockSize/4KB logical disks.
Make sure the alignment is correct, then RAID these
into 1 big disk.
This gives us one usable disk with maybe 128kb clusters.
Small files would need to share a cluster, but they
would have done that anyway..
The law is a weapon of the government, not a protection for the likes of you. Surely you understand that.
. . . which runs on the Nokia N800/N810 "Internet Tablets" (www.maemo.org). They might have done some tweaking, since this is Linux running on SSDs.
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
when I saw the headline, I was thinking not so much the fragmentation issues, but the repeated re-writing of logs and other small frequently accessed files that SSDs are susceptible to (maximum # of rated read-write cycles). Have there been any developments in that area?
I don't think this is going to be a significant problem when compared to normal seek time problems.
Lets say we have 100 k of data to read. 512 byte blocks would require 200 reads. 4k blocks would require 25 reads.
For rotating discs: If the data is contiguous, we have to hope that all the blocks are on the same track. If they are, then there is 1 (potentially very costly) seek to get to the track with all the blocks on it. The cost of the seek is dependent on the track it's going to, the track it's on, and whether or not the drive is sleeping or spun down. Otherwise we also get to do another very short seek, which is going to add a bit of time to get to the next adjacent track. Worst case scenario all 200 blocks are on different tracks, scattered randomly on the platter, requiring 200 seeks. Ouch ouch ouch.
For SSDs: What is important is the number of cells we have to read. Cells will be 4k in size. All seek times are essentially zero. Best case scenario, all data is contiguous, and the start block is at the start of a cell. Read time boils down to how fast the flash can read 20 cells. Worst case scenario is where the data is 100% fragmented, such that all 200 512 byte blocks reside in a different cell, requiring 200 cell reads. (10fold increase in time required) There will also be overhead in copying out the 512 byte data from each buffer and assembling things, but this time is negligible for this comparison.
While the 20x time increase (order N) looks significant, it's important to compare the probabilities involved, and just how bad things get. The most important difference between how these two drives react is the space between fragments. In the "worse case' for SSD, 100% fragmentation, is highly unlikely. I don't even want to think about what a spinning disc would do if asked to perform a head seek for 100% of the blocks in say, a 1mb file. The read head would probably sing like a tuning fork at the very least. 2000 cell reads compared to 2000 seeks, the SSD will win handily every single time, even if the tracks on the disc are close.
If the spacing between fragments is anything near normal, say 30-100k, then there will be some seeking going on with the disc, and there will be some wasted cell reads with the SDD, but having to do an extra one cell read compared with having to do an extra head seek, again the SSD wins hands down. The advantage of the SSD actually goes down as fragmentation goes down, because most fragments are going to cause a head seek, each of will significantly widen the time gap. Also a spinning disc will read in the blocks much faster than the cells on a SSD.
I realize the OP was more describing the possibility of "not so much bang for the buck as you are expecting" due to fragmentation, and I know the above hits more on comparing the two than what happens to the SSD, but if you consider the effects of fragmentation on a spinning disc, and then weigh how the impact compares with a SSD, it's easy to see that fragmentation that sent you running for the defrag tool yesterday may not even be noticeable with a SSD. So I'd call this a "non-issue".
What I'm waiting for is them to invest the same dev time in read speeds as write speeds. SSDs don't appear to be doing any interleaved reads - they're doing it for the writes because they're so slow. Though at this point I wonder if read speeds are just plain running into a bus speed limit with the SSDs?
I work for the Department of Redundancy Department.
Please mod the parent funny; so say we all.
From what I can scrape together quickly off of the Internet IANASE (I am not a software engineer). The biggest difference seems to be the lack of a need for error checking and disk defrag etc. Since the a normal spinning hdd does not actually delete a file but just removes the markers the filesystem treats all areas the same and does the same things to both real and non-real data to keep the disk state sane. In an SSD all of this leads to a lot of unneeded disk usage and premature degradation of the drive itself.
There seems to be more about Data set management but I don't quite understand it.. maybe someone more knowledgeable could explain it?
once more into the breach
I'm just sitting here thinking. Doesn't an SSD have a preset number of writes in it due to it's nature?
Does it really matter if they spread these writes around on the hard drive when the number of writes the drive is capable of doing is still the same in the end?
To drastically oversimplify, lets say that each block can be written to twice. Does it really matter if they used up the first blocks on the drive and just spread towards the end of the drive partition with general usage rather than jumping all over to try to spread the writes around?
Am I thinking about this the wrong way? What benefit does it give them to spread the writes around if the total number of writes doesn't change? Doesn't it just further fragment the files with little gain?
Yes, but for SSD's the blocks are larger - problems when essentially all software is optimized for smaller blocks.
You mean after all the hoopla the Linux people made about the Anticipatory Scheduler, the code is nothing more than:
wait_awhile()
What a ripoff.
"coincidentally", not "ironically".
i haven't yet found a sata device
(even doms) that require chs addressing.
clearly it was a mistake to use hardware
quirks to address sectors, but the again,
ata became a de facto standard before
realized it might become one.
Why not functionally group files to decrease or eliminate fragmentation? Or maybe this is already done.
For example - I have a large collection of MP3 files. They essentially do not change, as in I don't edit them, and rarely erase them. The file system could look at they type of file (mp3, vs doc) and place it accordingly. It could also look at the last change in the file and place it in a certain area. Older unchanged files are placed in a tightly placed/packed file area that is optimized and not fragmented.
..........FULL STOP.
that there's way too much effort and so much overhead for so little gain and the fatal problem of SSDs having a limited lifespan is just too much to overcome.
SSDs are awesome as a simple storage medium for stuff you don't change around much, i.e. a replacement for floppies/optical media/etc. They are NOT, however, a replacement for hard drives, and it's sad that people continue to push them in that direction when it is utterly futile and, frankly, stupid to do so.
AC because this is a harsh truth that no one wants to admit and therefore would be modded down to oblivion by mods that believe it's a troll.
Sure. There are *lots* of considerations beyond speed to want SSDs
And SSD drives are also shock-resistant.
The drives will be shocked when they see what I have in my pr0n collection.
Seems to me that Sun's zfs filesystem is ready to use the ssd storage. The copy-on-write strategy would seem to avoid the hot spots as zfs picks new blocks from the free pool rather than rewriting the same block.
Although the technology it is used in is repugnant, NTFS has always been the One True Filesystem.
I thought ZFS was.
get out of that faggot o/s while you can. it's nothing but a bunch of dog shit. i hear it's big among dick smoking faggots.
FAGGOTS SHOULD ALL DIE!!!!!
Good analysis. The statistics I've read indicate that SSD's don't perform all that much better than hard drives in real-world scenarios. I think this is part of the reason for that performance. On the other hand, they do use less energy, which is a clear positive for a laptop.
Free Conference Call -- No Spam, High Quality
On the other hand, they do use less energy, which is a clear positive for a laptop.
And thus they are cooler. A clear positive for any system, but especially a laptop.
They are also silent and don't vibrate.
They are also, from what I understand, more reliable.
I'm seriously considering flash drives for my desktop PC... they just need one more capacity jump and I think they'll be worth it. $400 for 128MB is a touch small.. but I'll go for it at $400 for 256MB. On my main PC I'm only using 236GB of my 500GB drive, and I could easily move 150GB of that onto my 1TB external e-sata drives that I turn on when I need.
I purchased an X300 Thinkpad for the company this week and took a close look at it. I thought expensive business notebooks come without crapware. And I was sure the X300 would be optimized. But they had defrags scheduled! I always thought defrag is a no no for ssds. Now I am not sure anymore. I deinstalled it first. But who knows?
I think, Theodore should look into technologies like the ZFS L2ARC (just look at using SSD as an additional cache to supplement disks based on rotating rust. The L2ARC stores recently evicted pages from the primary ARC (the Adjustable Replacement Cache) of ZFS on SSD. From my view this is a more reasonable usage of SSD than just as another primary storage media.
I recently wrote an article about the mechanism of ARC and L2ARC in conjunction with SSD in my blog, but i don't want to slashdot my site ;)
I would move /tmp to either a RAM disk or a hard drive. There is no point in having tmp files using up the lifespan of your SSD, especially after you just moved /home to extend its life. Also, you could move some of the stuff in /var to a hard drive or ramdisk. Good candidates might be /var/tmp and /var/log. Alternatively, you could just move the entire /var hierarchy to a hard drive.
Why not functionally group files to decrease or eliminate fragmentation? Or maybe this is already done.
In a Linux system, this is easily done, but few people bother.
Most of the write activity in Linux is in /tmp, and also in /var (for example, log files live in /var/log). User files go in /home.
So, you can use different partitions, each with its own file system, for /, /tmp, /home, and /var.
The major problem with this is that, if you guess wrong about how big a partition should be, it's a pain to resize things. So my usual thing is just to put /tmp on its own partition, and have a separate partition for / and for /home.
The /tmp partition and swap partition are put at the beginning of the disc, in hopes that seek penalties might be a little lower there. Then / has a generous amount of space, and /home has everything left over.
When a *NIX system runs out of disk space in /tmp, Very Bad Things happen. Far too much software was written in C by people who didn't bother to check error codes; things like disk writes don't fail often, but when /tmp is 100% full, every write fails. A system may act oddly when /tmp is full, without actually crashing or giving you a warning. So, the moral of the story is: disk is cheap, so if you give /tmp its own partition, make it pretty big; I usually use 4 GB now. However, if you run out of disk space in /var, it is not quite as serious. Your system logs stop logging. And, many databases are in /var so you may not be able to insert into your database anymore.
The main Ubuntu installer is fast, because it wipes out the / partition and puts in all new stuff. So, if you have separate partitions for / and /home, life is good: you just let the installer wipe /, and your /home is safely untouched. It's annoying when you have /home as just a subdirectory on / and you want to run the installer. But, by default, the Ubuntu installer will make one big partition for everything; if you want to organize by partitions, you will need to set things up by hand.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
SSDs have a different feel to them. The time to load a file is more consistent on my eeePC than on other laptops I own which have rotating disks.
http://michaelsmith.id.au
My kingdom for a mod point today...
WTF! SSD means Solid State Drive. Even a 5-year old can tell there's no disk in there. What's up with these retarded self-submitted articles? They're rarely written by someone competent.
I have read this as well, but you run into difficulty when assigning a specific partition for swap space, because the drive limits it's distribution to the assigned space only.
So, if 512m of that 80G is defined swap, it will stay within the range of that swap partition, burning it out much faster than the rest of the drive.
A solution to this, is to assign swap space to a large file contained within your primary partition (dd 512M worth of zeroes and mount it as swap), but then you are routing all your swaps through XFS/ext4 or whatever, adding significant overhead.
If you'll pay $400 for 256*MB*, I think you've got a little too much money and should give me some....
that was beautiful man, just beautiful..[sniff]
That was my idea when I've proposed an "object storage system" here on /. a few months ago: associate type and metadata with every file, making them more "object-like" (as in object-oriented programming). The storage system would know the behaviour of each object (whether it is likely to grow, or more likely to be modified in place, or probably not modified at all, etc), and would choose the most efficient way of storing every particular kind of data. I've also proposed separate namespaces for each process, capability-based security, dropping paths in favour of non-hierarchical tags, and a few other "revolutionary" ideas that all had only one downside: nobody's going to break backwards compatibility, especially while the current system still "just works".
Good point, I will have to think about that...
Well, I fired up Ubuntu with the new configuration and I wasn't disappointed - WOW!
Booting is lightning quick - I am still doing a lot of downloads so I haven't had a chance as some real performance tests but from what I have seen so far the results are impressive.
My rights don't need management.
"tytso" is Theodore T'so.
He and Remy Card wrote ext2. He and Stephen Tweedie wrote ext3. He and Ming Ming Cao wrote ext4.
He maintains the filesystem repair tool (e2fsck) and resizing tool for those filesystems.
He also created the world's first /dev/random device, maintained the tsx-11.mit.edu Linux archive site for many years, and wrote a chunk of Kerberos. He's been the technical chairman for many Linux-related conferences. He pretty much runs the kernel summit.
He's certainly not a kid. I think he's about to turn 40.
Really, Intel ought to give tytso piles of free SSD hardware before it goes on sale. This would help Intel by encouraging tytso to optimize Linux for Intel's SSD hardware.
So many choices!
belt sander
nitric acid
cutting torch
charcoal and a blower
chip wired into an AC wall socket
thermite
repeated use as a model rocket blast deflector
drill press
I just recently put in two 128Gb SSD disks in a raid 0 set. I set up a ram drive for use as /tmp and have /var going to another partition on a standard SATA harddrive. I changed fstab to mount the drives noatime so it doesn't record file access times. I also made some other tweaks pointing any programs or services that write logs or use a temporary cache somewhere to use /tmp. Its a software raid I use so I'm using /dev/mapper/-- as the device so I'm not exactly sure how to use the schedular, although I have set a line in GRUB that I think does it.
Ubuntu 64bit boots up in about 10 seconds.
*DrugCheese rants*
Fragmentation is a *DIFFERENT* issue in the world of SSD.
It now becomes a matter of 'number of commands issued'.
Lets assume a completely fragmented file of size X. Lets say size X is a multiple of one cluster (of size Y). Lets also say the controller does no read ahead into the cache.
So to read all of the file I would need to issue X/Y reads. In the world of spinning disks there are three costs here. Bytes sent over the cable to the controller to ask for each cluster (time A). Time waiting for the controller to retrieve the bytes from the disc (time B). Time for the bytes to be sent across the the bus (time C) plus the overhead of the packets (time D). Third the OS having to reassemble the bytes back into a coherent file (time E).
With SSD time B is no longer the dominating factor. Time A, D, and E become the dominating factor. Time A, D, and E are directly related to how fragmented the file is. Time C can vary wildly on a rotating disk depending one where the clusters are. On a SSD it fairly constant yet very low in time (not 0).
So an unfragmented file could issue 1 command to say get N clusters and get a giant blast back. Or you could have a totaly fragmented file and get (X/Y)*2 packets back and forth.
Now that is a worst case. But not something to just be 'ignored' because 'its now faster, so dont worry about it'. The proper answer is 'yes it is faster how much better can we do'. The SSD is still WAY slower than memory and WAY slower than the CPU.
With SSD's order of files is no longer as important as it is with current spinning disks. You also want to reduce fragmentation of 'empty' space. Why do that? To reduce the possibility of fragmenting a real data file. Both types still have this issue.
Also why waste bytes on the bus on 'overhead' when you could be better using that for real data? The bus will become THE dominating factor real soon of how fast these things will run. Right now its under (not by much).
In certain situations the increased performance of a SSD removes a bottleneck which would result in increased CPU/memory load. On certain platforms this means these components would spend less time in their lower power states, ie lowered cpu multiplier or core voltage level.
Tasks for task a SSD saves power, possibly more than would be lost by any higher CPU speed steps, but in something like a looping benchmark more work is done in the same time therefore more power draw.
This phenomena Had tom's hardware fooled http://www.tomshardware.com/reviews/ssd-hdd-battery,1955.html ("The SSD Power Consumption Hoax : Flash SSDs Donâ(TM)t Improve Your Notebook Battery Runtime â" they Reduce It")
They later posted a retraction after some people pointed out this flaw.
I would like to see optimizations in linux to take this into account this effect. Perhaps increasing power saving state thresholds to compensate.
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
Hah. I think he meant 256GB.
Free Conference Call -- No Spam, High Quality
There is also the ability to "free" unused blocks (with CFA commands at least), maybe so they can be erased in the background, or freed from wear-leveling tracking. There is a commercial device-mapper plugin to force large physical sectors on devices that still use 512 byte logical sectors. Not much different than md-raid devices whose stripe width or stride is much like a large physical sector.
a few other "revolutionary" ideas that all had only one downside: nobody's going to break backwards compatibility, especially while the current system still "just works"
Actually, my guess is that - like most "revolutionary ideas" people throw out there - the real issue is you expect someone else to implement them. How about you come back when you've got a proof of concept that people can get excited about?
While the top quality stuff might last, my own personal experience with el cheapo SSDs is that they go bad quickly with moderate (in my case laptop) use due to shabby wear levelling. Others are also warning about (cheap) SSDs throwing away data too. Such SSDs are often the ones you are going to encounter so while the majority of SSDs out there show this behaviour I think it's a warning worth mentioning...
It's a lot of work to make even a PoC, and I've got work, school, a few other small projects, and a life. This kind of system would need a very careful design, a lot of experience, and deep knowledge of how the existing solutions work -- knowledge, skill, and experience isn't something you gain overnight. I'm sure that at some point in the future I will try actually implementing it, but at the moment this point seems a little bit distant.
nice one n/t
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
I think so too, but I'm allowed to hope he's feeling overly rich and generous, aren't I?
Frak me. Ronald, is that you....
Interstitial spaces are filled with cream.
When it comes to reads you can't really postpone them - if someone wants to play that music file now and you don't have it already in cache (and Linux already uses unused memory as cache) then you have to hit the disk.
When it comes to writes you you can often delay them so long as no one is waiting on you telling them that you have written the data to the disk (in which case you have to no choice but to hit the disk). This is already tunable via /proc/sys/vm/dirty_writeback_centisecs . Further there are things like /proc/sys/vm/laptop_mode that will try and batch writes when other I/O was going to happen (e.g. when you play that music file all the writes can happen too). Of course, in the event of a crash you lose much more data (as it wasn't on the disk) and you create more disk contention. See the Lesswatts disk tips page for more details.
Context sensitive defrag - sounds like good sense to me, whichever hardware you use.
Who said a brief historical summary of the life of PCs wouldn't look totally bonkers ?
Did anyone else feel this guy lost all credibility when they read the bit where he wastes 1gb on /boot and uses lvm for a single volume as a second partition?
It's an SSD dude, space and overhead are already major concerns and you just exploded them..
There are defrag utilities that sort by last modified date.
I use 1GB for /boot because I'm a kernel developer and I end up experimenting with a large number of kernels (yes, on my laptop --- I travel way to much, and a lot of my development time happens while I'm on an airplane). In addition, SystemTap requires compiling kernels with debuginfo enabled, which makes the resulting kernels gargantuan --- it's actually not that uncommon for me to fill my /boot partition and need to garbage collect old kernels. So yes, I really do need a 1GB for /boot.
As far as LVM, of course I use more than a single volume; separate LV's get used for test filesystems (I'm a filesystem developer, remember), but more importantly, the most important reason to use LVM is because it allows you to take snapshots of your live filesystem and then run e2fsck on the snapshot volume --- if the e2fsck is clean you can then drop the snapshot volume, and run "tune2fs -C 0 -T now /dev/XXX" on the file system. This eliminates boot-time fsck's, while still allowing me to make sure the file system is consistent. And because I'm running e2fsck on the snapshot, I can be reading e-mail or browsing the web while the e2fsck is running in the background. LVM is definitely worth the overhead (which isn't that much, in any case).
Dave? Dave Culter? Is that you? Racked with guilt, now, are we, Dave?
Look, some of us haven't even forgiven you for what you did to RSX-11D with that RSX-11M monstrosity, much less what you did cross-breeding ODS2 with DOS. Just 'cause KO cancelled that whole EPIC thing was no reason to become a Sith, dude!
Rant all you want. You're not forgiven. Not by anybody who had less-than-6-digit badge numbers.
That was my idea when I've proposed an "object storage system" here on /. a few months ago: associate type and metadata with every file, making them more "object-like" (as in object-oriented programming). The storage system would know the behaviour of each object (whether it is likely to grow, or more likely to be modified in place, or probably not modified at all, etc), and would choose the most efficient way of storing every particular kind of data. I've also proposed separate namespaces for each process, capability-based security, dropping paths in favour of non-hierarchical tags, and a few other "revolutionary" ideas that all had only one downside: nobody's going to break backwards compatibility, especially while the current system still "just works".
Files data grouping was done in reiser4 by means of introducing "fibers", where the "fiber" is the way to say FS to group files data with some policy. Policies for commonly used extensions like *.c, *.h, *.o, *.mp3, etc., were built-in, others could be added. This ensured that all *.o are physically placed close to each other so that read-ahead and other nice things (like smaller number of seeks) really do their job wile compiling big tree of sources like Linux kernel.
What raises the question: does Hans Reiser have a laptop, and SVN access?
Reiser4 was supposed to have a lot of metadata, at least eventually.
Making use of the metadata is not the hard thing, the issue is to make it fast, and try not to break too many APIs. I trusted Reiser on that.
Another group that already had that idea is MS. They have been messing around with that WinFS thing for at least a decade. They were trying to use MS sql server, at some point, I think that approach is what is keeping them from succeeding.