Optimizing Linux Systems For Solid State Disks
tytso writes "I've recently started exploring ways of configuring Solid State Disks (SSDs) so they work most efficiently in Linux. In particular, Intel's new 80GB X25-M, which has fallen down to a street price of around $400 and thus within my toy budget. It turns out that the Linux Storage Stack isn't set up well to align partitions and filesystems for use with SSD's, RAID systems, and 4k sector disks. There are also some interesting configuration and tuning that we need to do to avoid potential fragmentation problems with the current generation of Intel SSDs. I've figured out ways of addressing some of these issues, but it's clear that more work is needed to make this easy for mere mortals to efficiently use next generation storage devices with Linux."
If I mount /home on a separate drive, (good to do when upgrading) the rest of the Linux file system fits nicely on a small SSD.
My rights don't need management.
Sure. There are *lots* of considerations beyond speed to want SSDs
And SSD drives are also shock-resistant.
> So why should I get a SSD vs. a CF card?
10 times better performance and wear-leveling worth a crap.
It will outlast a standard hard drive by orders of magnitude so it's completely not an issue.
With wear leveling and the technology now supporting millions of writes it just doesn't matter. Here's a random data sheet: http://mtron.net/Upload_Data/Spec/ASIC/MOBI/PATA/MSD-PATA3035_rev0.3.pdf
"Write endurance: >140 years @ 50GB write/day at 32GB SSD"
Basically the device will fail before it reaches the it runs out of write cycles. You can overwrite the entire device twice a day and it will last longer than your lifetime. Of course it will fail due to other issues before then anyway.
Can there be a mention of SSDs without this out-dated garbage being brought up?
Unfortunately flash SSDs usually have some percentage of sectors you cannot directly access, these are used for wear leveling and bad sector remapping. So when you dd with /dev/zero, it is quite possible that some part of the original data is left intact. And there can be quite alot of those sectors, I recall reading on one SSD drive that had 32GiB flash in it, but had 32GB available for the user, so 2250MiB was used for wear leveling and bad sectors (helps to get better yealds if you can have several bad 512KiB cells).
- Raynet --> .
A real SSD has several advantages over using CF cards, but not for the reasons you state.
With a simple plug adapter, CF cards can be connected to an IDE interface, so speeds won't be limited by interface speed. The most recent revision of the CF spec adds support for IDE Ultra DMA 133 (133 MB/s)
A couple of additional points, just because I love nitpicking:
- A USB 2.0 mass storage device has a practical maximum speed of around 25 MB/s, not 40 Mb/s.
- The so-called SATA II interface (that name is actually incorrect and is not sanctioned by the standardization body) has a maximum speed of 300 MB/s, not Mb/s.
There are a few tricks up the manufacturer's sleeve to make this slightly better than it really is:
1. large block size (120k-200k?) means that even if you write 20 bytes, the disk physically writes a lot more. For logfiles and databases (quite common on desktops too, think of index dbs and sqlite in firefox for storing the search history...) where tiny amounts of data are modified, this can add up rapidly. Something writes to the disk once every second? That's 16.5GB / day, even if you're only changing a single byte over and over.
2. Even if the memory cells do not die, due to the large block size, fragmentation will occur (most of the cells will have a small amount of space used in them). There has been a few articles about this that even devices with advanced wear leveling technology like Intel's exhibit a large performance drop (less than half of the read/write performance of a new drive of the same kind) after a few months of normal usage.
3. According to Tomshardware unnamed OEMs told them that all the SSD drives they tested under simulated server workloads got toasted after a few months of testing. Now, I wouldn't necessary consider this accurate or true, but I'd sure as hell would not use SSDs in a serious environment until this is proven false.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
Technically speaking, yes, the drive is more likely to go from 'all cells functioning' to 'many cells dead' in a relatively short amount of time due to wear levelling, whereas without it the mode of failure would be a more gradual reduction in functioning cells.
Practically speaking, however, these things support an awful lot of read/write cycles. On the order of a million or more, according to the data I could find. Unfortunately the Intel datasheet for the drive mentioned in the summary doesn't actually include write-cycle data, though.
A quick and dirty calculation (not taking into account block size, etc.) for drive lifetime is simply (capacity)*(write cycles)/(write speed).
Imagine a drive with no wear levelling. Say you have a 1GB file, the entirety of which is being continually rewritten to the same 1GB section of the drive. A million read/write cycles means you need to write approximately 1,000,000 GB (that's 1000TB!) to that 1GB section of drive to kill it. Again, somewhat inaccurate in the real world, but good enough for a back of the envelope estimate. Allowing a fairly generous write speed of 100MB/s, writing to that same 1GB area of disk 24/7, would burn it out in around 115 days - about 4 months. In that time, remember, you'll have generated 1000TB of data - that's certainly not insignificant, even for fairly major applications, but it could be done, and you're left with a drive that's got 1GB less capacity than it started with.
Now consider the same case with wear levelling. Assume for the sake of simplicity it functions perfectly, and ignore block size. On an 80GB drive, continually overwriting that same 1GB file, it will simply cycle through the entire 80GB capacity of the drive repeatedly rather than just hammering the same 1GB section. This means that you suddenly increased the effective lifespan by a factor of 80 (again, not entirely real-world due to the fact that the drive would normally have data filling some of the rest of that 80GB, but sufficient to get the point across). You're now looking at over 25 years of continuous writing, by which time you will have generated 8 yottabytes of data.
That's why wear levelling is a good thing. Even on a disk that's completely full (not something that happens particularly often, but still worth thinking about) the drive itself has some built in excess capacity to use for wear reduction.
Sun's new 7000 series storage arrays use them, and that series runs OpenSolaris. So I guess Solaris has at least some SSD optimisatioons... http://www.infostor.com/article_display.content.global.en-us.articles.infostor.top-news.sun_s-ssd_arrays_hit.1.html
Why not functionally group files to decrease or eliminate fragmentation? Or maybe this is already done.
In a Linux system, this is easily done, but few people bother.
Most of the write activity in Linux is in /tmp, and also in /var (for example, log files live in /var/log). User files go in /home.
So, you can use different partitions, each with its own file system, for /, /tmp, /home, and /var.
The major problem with this is that, if you guess wrong about how big a partition should be, it's a pain to resize things. So my usual thing is just to put /tmp on its own partition, and have a separate partition for / and for /home.
The /tmp partition and swap partition are put at the beginning of the disc, in hopes that seek penalties might be a little lower there. Then / has a generous amount of space, and /home has everything left over.
When a *NIX system runs out of disk space in /tmp, Very Bad Things happen. Far too much software was written in C by people who didn't bother to check error codes; things like disk writes don't fail often, but when /tmp is 100% full, every write fails. A system may act oddly when /tmp is full, without actually crashing or giving you a warning. So, the moral of the story is: disk is cheap, so if you give /tmp its own partition, make it pretty big; I usually use 4 GB now. However, if you run out of disk space in /var, it is not quite as serious. Your system logs stop logging. And, many databases are in /var so you may not be able to insert into your database anymore.
The main Ubuntu installer is fast, because it wipes out the / partition and puts in all new stuff. So, if you have separate partitions for / and /home, life is good: you just let the installer wipe /, and your /home is safely untouched. It's annoying when you have /home as just a subdirectory on / and you want to run the installer. But, by default, the Ubuntu installer will make one big partition for everything; if you want to organize by partitions, you will need to set things up by hand.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Because of this, I imagine that the author would like Linux devs to better support SSD's by getting non-flash file systems to support SSD better than they are today.
Heh. The author is a Linux dev; I'm the ext4 maintainer, and if you read my actual blog posting, you'll see that I gave some practical things that can be done to support SSD's today just by better tuning parameters given to tools like fdisk, pvcreate, mke2fs, etc., and I talked about some of the things I'm thinking about to make ext4 better at support SSD's better than it does today.....