Optimizing Linux Systems For Solid State Disks

Mere mortals need mroe toy budget by wjh31 · 2009-02-21 03:27 · Score: 4, Insightful

I think the bigger challenge will be in getting mere mortals to have a $400 toy budget to afford the SSD

Re:Mere mortals need mroe toy budget by KibibyteBrain · 2009-02-21 03:36 · Score: 2, Insightful

Well, they will obviously go down in price eventually. The real price issue won't be affordability but rather value. Do most consumers out there really want a what would seem to average out to slightly faster drive, or an order of magnitude or two more storage? There have always been fast drive solutions in the past and they have never been very popular, and quickly become obsolete. Eventually some sort of SSD will take over the market, but I don't believe this sort of compromised experience business model will sell them, unless cloud storage and internet everywhere becomes mainstream fast.
Re:Mere mortals need mroe toy budget by piripiri · 2009-02-21 03:50 · Score: 3, Informative

Sure. There are *lots* of considerations beyond speed to want SSDs
And SSD drives are also shock-resistant.
Re:Mere mortals need mroe toy budget by Anonymous Coward · 2009-02-21 04:31 · Score: 2, Interesting

As other components become less noisy, the "solid state" electronics' acoustic noise becomes audible. It isn't necessarily faulty electronics, just badly designed with no consideration for vibrations due to electromagnetic fields changing at audible frequencies. These fields subtly move components and this movement causes the acoustic noise. Most often it is a power supply or regulation unit which causes high pitched noises. Old tube TV sets often emit noise at the line frequency of the TV signal (ca. 15.6kHz for PAL, ca. 15.8kHz for NTSC).
Re:Mere mortals need mroe toy budget by MooUK · 2009-02-21 11:01 · Score: 2, Funny

Surely, if you can't hear over 9kHz, that makes you the insensitive one?

Re:Agreed .. But equally important is ... by ultrabot · 2009-02-21 03:37 · Score: 2, Informative

However, for
many of us who require better-than-average data security, the matter of SSD's read/write behaviour makes the devices extremely vulnerable to analyses and discovery of data the owner/author of which believes to be inaccessible to others: 'secure wiping', or lack thereof, is the issue.

Obviously you should be encrypting your sensitive data.

Also, it should be no problem to write a bootable cd/usb that does a complete wipe. Just write over the whole disk, erase, repeat. No wear leveling will get around that.

--
Save your wrists today - switch to Dvorak

Is it only linux? by jmors · 2009-02-21 03:37 · Score: 4, Interesting

This article makes me wonder if any OS is really properly optimized for SSDs. Has there been any analysis as to whether or not windows machines properly optimize the use of solid state disks? Perhaps the problem goes beyond just linux?

--
The Matrix is real... but I'm only visiting!

Re:Is it only linux? by Jurily · 2009-02-21 03:49 · Score: 2, Informative

unfortunately the default 255 heads and 63 sectors is hard coded in many places in the kernel, in the SCSI stack, and in various partitioning programs; so fixing this will require changes in many places.
Looks like someone broke the SPOT rule.
As for other OSes:

Vista has already started working around this problem, since it uses a default partitioning geometry of 240 heads and 63 sectors/track. This results in a cylinder boundary which is divisible by 8, and so the partitions (with the exception of the first, which is still misaligned unless you play some additional tricks) are 4k aligned.
Re:Is it only linux? by mxs · 2009-02-21 04:17 · Score: 2, Informative

Of course it goes beyond just Linux. Microsoft is aware of the problem and working on improving its SSD performance (they already did some things in Vista as the article states, and Windows 7 has more in store; google around to find a few slides from WinHEC on the topic).
The problem with Windows w.r.t. optimizing for SSDs is that it LOVES to do lots and lots of tiny writes all the time, even when the system is idle (and moreso when it is not). Try moving the "prefetch" folder to a different drive. Try moving the system log event files to a different drive. And try to keep an eye out for applications that use the system drive for small writes, extensively (or muck about in the registry a lot). These are the hard parts. The easier parts would be to make sure hibernation is disabled, pagefiles are not on the SSD (good luck in getting Windows to not use pagefiles at all; possible, but painful even if you have a dozen gigs of memory), prefetching is disabled, the filesystem is properly aligned, printer spools, etc. With only the things Windows provides, it is painful to attempt to prolong your SSD's life (this is not just about performance; remember that you only have a limited amount of erases until the drive becomes toast).
There are some solutions; MFT for Windows (http://www.easyco.com/) provides a block device that consolidates many small writes into larger ones and does not overwrite anything unless absolutely necessary (i.e. changes are written onto the disk sequentially; overwriting only takes place once you run out of space). It is very, very costly, but it does its job well. Performance skyrockets, drive longevity improves by an order of magnitude.
You can also use hacks such as Windows SteadyState; This also streamlines writes (but adds another layer of indirection). Performance improves, but you get to deal with SteadyState-issues. EFT also works (and is less of a GUI-y system, though largely providing the same services even on Windows 2000/XP); you have got to be careful though, if your system tends to lose power or crash, all the changes since the last boot will be lost; EFT can be made to write out all the changes it has accumulated -- but after that, the only way to reenable it is to restart the system.
Windows is not particularly nice to SSDs when used as a system disk. For data partition it is not quite as bad (although if you deal with many small writes, you might still run into heaps of trouble). The optimizations related here for Linux are applicable to Windows as well (aligning filesystem blocks to erase-blocks and 4k nand-sectors). You would also want to attempt to move stuff that does lots of small writes to a different (spinning) disk -- system logs, for instance, and most spool directories. You'd also want to make absolutely sure that you do not have access time updates enabled; each of those is, essentially, a write (even if ultimately consolidated).
Re:Is it only linux? by NekoXP · 2009-02-21 04:29 · Score: 5, Insightful

Yeah, hard disk manufacturers.
Since they moved to large disks which require LBA, they've been fudging the CHS values returned by the drive to get the maximum size available to legacy operating systems. Since when did a disk have 63 heads? Never. It doesn't even make sense anymore when most hard disks are single platter (therefore having 1 or 2) and SSDs don't even have heads.
What they need to do is define a new command structure for accurately determining the best structure on the disk - on an SSD this would report the erase block size or so, on a hard disk, how many sectors are in a cylinder, without fucking around with some legacy value designed in the 1980's.
Re:Is it only linux? by tonyr60 · 2009-02-21 08:00 · Score: 4, Informative

Sun's new 7000 series storage arrays use them, and that series runs OpenSolaris. So I guess Solaris has at least some SSD optimisatioons... http://www.infostor.com/article_display.content.global.en-us.articles.infostor.top-news.sun_s-ssd_arrays_hit.1.html
Re:Is it only linux? by c0t0d0s0.org · 2009-02-21 19:07 · Score: 2, Informative

You should look at the L2ARC and seperated ZIL features at ZFS in Solaris and Opensolaris. It used the SSD in the way you want it.

Ironically I was just going out to buy a small one by earthforce_1 · 2009-02-21 03:41 · Score: 3, Informative

If I mount /home on a separate drive, (good to do when upgrading) the rest of the Linux file system fits nicely on a small SSD.

--
My rights don't need management.

Re:SSD's should have no problem with fragmentation by von_rick · 2009-02-21 03:44 · Score: 3, Insightful

From economics, lets turn our attention to optimizing this toy of ours. The thing with SSDs is that they don't have a read/write head to worry about. This means that no matter where the data is stored in the device, all we need to do is specify the fetch location and the logic circuits select that block to extract the data from desired location. From what I've heard, the SSDs have an algorithm to actually assign different blocks to store the data so that the memory cells in a single locations aren't overused.

--

Face your daemons!

No. Not Now. Not Ever. I'm Coming For All of You! by Anonymous Coward · 2009-02-21 04:06 · Score: 5, Funny

> Vista has already started working around this problem, since it uses a default partitioning geometry of 240 heads and 63 sectors/track. This results in a cylinder boundary which is divisible by 8, and so the partitions (with the exception of the first, which is still misaligned unless you play some additional tricks) are 4k aligned. So this is one place where Vista is ahead of Linuxâ¦.

Although the technology it is used in is repugnant, NTFS has always been the One True Filesystem. It descended from DIGITAL's ODS2 (On Disk Structure 2) which traces back to the original Five Models (PDP 1, 8, 10, 11 and 12). You see, ODS was written by passionate people with degrees and rich personal lives in Massachusetts who sang and danced before the fall of humanity to the indignant Gates series who assimilated their young wherever possible and worked them into early graves during his epic battle with the Steves before the UNIX enemy remerged after a 25 year sleep and nuked the United States, draining all of its technological secrets to the other side of the world. Gates, realizing what he's done, now travels the universe seeking to rebuild his legacy by purifying humanity while the Steve series attempts to rebuild itself. Some of the original Five are still around, left to logon to Slashdot and witness what's left of the shadow of humanity still in the game as they struggle blindly around in epic circles indulging new and different ways to steal music, art and technology to make up for their lack of creativity long ago bred out of them by the Gates series.

Why pretend these are ordinary disks? by jensend · 2009-02-21 04:07 · Score: 4, Insightful

SSDs gradually gain more and more sophisticated controllers which do more and more to try to make the SSD seem like an ordinary hard drive, but at the end of the day the differences are great enough that they can't all be plastered over that way (the fragmentation/long term use problems the story linked to are a good example). I know that (at present- this could and should be fixed) making these things run on a regular hard drive interface and tolerate being used with a regular FS is important for Windows compatibility, but it seems like a lot of cost could be avoided and a lot of performance gained by having a more direct flash interface and using flash-specific filesystems like UBIFS, YAFFS2, or LogFS. I have to wonder why vendors aren't pursuing that path.

Re:Why pretend these are ordinary disks? by NekoXP · 2009-02-21 04:36 · Score: 4, Interesting

Because Intel and the rest want to keep their wear-leveling algorithm and proprietary controller as much of a secret as possible so they can try to keep on top of the SSD market.
Moving wear-levelling into the filesystem - especially an open source one - effectively also defeats the ability to change the low-level operation of the drive when it comes to each flash chip - and of course, having a filesystem and a special MTD driver for *every single SSD drive manufactured* when they change flash chips or tweak the controller, could get unwieldy.
Backing them behind SATA is a wonderful idea, but this reliance on CHS values I think is what's killing it. Why is the Linux block subsystem still stuck in the 20MB hard-disk era like this?
Re:Why pretend these are ordinary disks? by gillbates · 2009-02-21 11:16 · Score: 2, Insightful

Why is the Linux block subsystem still stuck in the 20MB hard-disk era like this?
As one who had to tune the performance of hard drives at the kernel level, I can say with some authority that the Linux block subsystem is not at all stuck in the 20MB hard-disk era. In fact, everything is logical blocks these days, and it's the filesystem driver and IO schedulers which determine the write sequences. The block layer is largely "dumb" in this regard, and treats every block device as nothing more than a large array of blocks. A properly designed wear-leveling filesystem has no dependencies on the underlying hardware with one exception: block size. But seeing as every Linux filesystem since Ext2 has had the option of creating filesystems with different block sizes, I doubt this is, or ever will be, an issue.
The only real issue with wear-leveling filesystems is that they don't work well with conventional hard disks, largely due to the fact that with flash, the block access time is pretty much constant no matter where on the drive it is located. Hence, there's no need to schedule based on C/H/S values. Because of this disparity, there won't be ONE TRUE FILESYSTEM in Linux. This might actually be a good thing, if you've ever been privy to the debates over Reiserfs and Ext3...
The hardware SSD wear-levelling algorithms used by Intel, et al... are nothing special. Yes, they probably do offer higher performance than a general purpose filesystem, but performance is not their reason for existence. They exist largely because the overwhelming majority of consumer devices still use FAT32, which would destroy an SSD without wear-leveling very quickly. Think of how many flash chips are used in cameras, cellphones, thumb drives, etc... Intel had to do this just to access the non-Linux market.

--
The society for a thought-free internet welcomes you.

Re:Agreed .. But equally important is ... by Antique+Geekmeister · 2009-02-21 04:08 · Score: 3, Insightful

Such tools already exist. Even the venerable "dd if=/dev/zero of=/dev/sda" is extremely efficient at flushing a drive well beyond the ability of any but the most well-equipped recovery services, and it's a lot faster than the "overwrite with zeroes, then ones, then 101010..., then 010101..., then random data" approach used by some people with too much time on their hands and too much paranoia for casual data.

Re:No Money, Mo Problems by larry+bagina · 2009-02-21 04:16 · Score: 3, Funny

No worries. Once Barack Obama(1) pays for your house and car, he'll pay off your credit card bills.

future generations of Americans

--
Do you even lift?

These aren't the 'roids you're looking for.

Re:Still too expensive... by NekoXP · 2009-02-21 04:22 · Score: 4, Informative

> So why should I get a SSD vs. a CF card?

10 times better performance and wear-leveling worth a crap.

Re:Take a look at Maemo . . . by DragonTHC · 2009-02-21 04:32 · Score: 2, Interesting

Don't forget android.

--
They're using their grammar skills there.

Re:SSD's should have no problem with fragmentation by v1 · 2009-02-21 04:34 · Score: 5, Interesting

I don't think this is going to be a significant problem when compared to normal seek time problems.

Lets say we have 100 k of data to read. 512 byte blocks would require 200 reads. 4k blocks would require 25 reads.

For rotating discs: If the data is contiguous, we have to hope that all the blocks are on the same track. If they are, then there is 1 (potentially very costly) seek to get to the track with all the blocks on it. The cost of the seek is dependent on the track it's going to, the track it's on, and whether or not the drive is sleeping or spun down. Otherwise we also get to do another very short seek, which is going to add a bit of time to get to the next adjacent track. Worst case scenario all 200 blocks are on different tracks, scattered randomly on the platter, requiring 200 seeks. Ouch ouch ouch.

For SSDs: What is important is the number of cells we have to read. Cells will be 4k in size. All seek times are essentially zero. Best case scenario, all data is contiguous, and the start block is at the start of a cell. Read time boils down to how fast the flash can read 20 cells. Worst case scenario is where the data is 100% fragmented, such that all 200 512 byte blocks reside in a different cell, requiring 200 cell reads. (10fold increase in time required) There will also be overhead in copying out the 512 byte data from each buffer and assembling things, but this time is negligible for this comparison.

While the 20x time increase (order N) looks significant, it's important to compare the probabilities involved, and just how bad things get. The most important difference between how these two drives react is the space between fragments. In the "worse case' for SSD, 100% fragmentation, is highly unlikely. I don't even want to think about what a spinning disc would do if asked to perform a head seek for 100% of the blocks in say, a 1mb file. The read head would probably sing like a tuning fork at the very least. 2000 cell reads compared to 2000 seeks, the SSD will win handily every single time, even if the tracks on the disc are close.

If the spacing between fragments is anything near normal, say 30-100k, then there will be some seeking going on with the disc, and there will be some wasted cell reads with the SDD, but having to do an extra one cell read compared with having to do an extra head seek, again the SSD wins hands down. The advantage of the SSD actually goes down as fragmentation goes down, because most fragments are going to cause a head seek, each of will significantly widen the time gap. Also a spinning disc will read in the blocks much faster than the cells on a SSD.

I realize the OP was more describing the possibility of "not so much bang for the buck as you are expecting" due to fragmentation, and I know the above hits more on comparing the two than what happens to the SSD, but if you consider the effects of fragmentation on a spinning disc, and then weigh how the impact compares with a SSD, it's easy to see that fragmentation that sent you running for the defrag tool yesterday may not even be noticeable with a SSD. So I'd call this a "non-issue".

What I'm waiting for is them to invest the same dev time in read speeds as write speeds. SSDs don't appear to be doing any interleaved reads - they're doing it for the writes because they're so slow. Though at this point I wonder if read speeds are just plain running into a bus speed limit with the SSDs?

--
I work for the Department of Redundancy Department.

Re:repeated re-write issues? by nedlohs · 2009-02-21 04:53 · Score: 4, Informative

It will outlast a standard hard drive by orders of magnitude so it's completely not an issue.

With wear leveling and the technology now supporting millions of writes it just doesn't matter. Here's a random data sheet: http://mtron.net/Upload_Data/Spec/ASIC/MOBI/PATA/MSD-PATA3035_rev0.3.pdf

"Write endurance: >140 years @ 50GB write/day at 32GB SSD"

Basically the device will fail before it reaches the it runs out of write cycles. You can overwrite the entire device twice a day and it will last longer than your lifetime. Of course it will fail due to other issues before then anyway.

Can there be a mention of SSDs without this out-dated garbage being brought up?

Re:Agreed .. But equally important is ... by raynet · 2009-02-21 04:54 · Score: 3, Informative

Unfortunately flash SSDs usually have some percentage of sectors you cannot directly access, these are used for wear leveling and bad sector remapping. So when you dd with /dev/zero, it is quite possible that some part of the original data is left intact. And there can be quite alot of those sectors, I recall reading on one SSD drive that had 32GiB flash in it, but had 32GB available for the user, so 2250MiB was used for wear leveling and bad sectors (helps to get better yealds if you can have several bad 512KiB cells).

--
- Raynet --> .

Don't SSD's have a pre-set number of writes? by DJRumpy · 2009-02-21 05:04 · Score: 2, Funny

I'm just sitting here thinking. Doesn't an SSD have a preset number of writes in it due to it's nature?

Does it really matter if they spread these writes around on the hard drive when the number of writes the drive is capable of doing is still the same in the end?

To drastically oversimplify, lets say that each block can be written to twice. Does it really matter if they used up the first blocks on the drive and just spread towards the end of the drive partition with general usage rather than jumping all over to try to spread the writes around?

Am I thinking about this the wrong way? What benefit does it give them to spread the writes around if the total number of writes doesn't change? Doesn't it just further fragment the files with little gain?

Re:Don't SSD's have a pre-set number of writes? by berend+botje · 2009-02-21 06:11 · Score: 2, Informative

Say you 100 cells and can write 10 times to each cell.

Having every cell written to nine times: 100 * 9 = 900 writes and you still have a completely working disk.

Writing 900 writes to the first couple of cells: you now have 90 defective cells. In fact, as you still have to rewrite the data to working cells, you have lost your data as there aren't enough working cells.
Re:Don't SSD's have a pre-set number of writes? by MoonBuggy · 2009-02-21 07:44 · Score: 3, Informative

So in effect, instead of 'burning' out a specific section of an SDD, they will simply burn out the entire disk at once due to wear leveling?
Technically speaking, yes, the drive is more likely to go from 'all cells functioning' to 'many cells dead' in a relatively short amount of time due to wear levelling, whereas without it the mode of failure would be a more gradual reduction in functioning cells.
Practically speaking, however, these things support an awful lot of read/write cycles. On the order of a million or more, according to the data I could find. Unfortunately the Intel datasheet for the drive mentioned in the summary doesn't actually include write-cycle data, though.
A quick and dirty calculation (not taking into account block size, etc.) for drive lifetime is simply (capacity)*(write cycles)/(write speed).
Imagine a drive with no wear levelling. Say you have a 1GB file, the entirety of which is being continually rewritten to the same 1GB section of the drive. A million read/write cycles means you need to write approximately 1,000,000 GB (that's 1000TB!) to that 1GB section of drive to kill it. Again, somewhat inaccurate in the real world, but good enough for a back of the envelope estimate. Allowing a fairly generous write speed of 100MB/s, writing to that same 1GB area of disk 24/7, would burn it out in around 115 days - about 4 months. In that time, remember, you'll have generated 1000TB of data - that's certainly not insignificant, even for fairly major applications, but it could be done, and you're left with a drive that's got 1GB less capacity than it started with.
Now consider the same case with wear levelling. Assume for the sake of simplicity it functions perfectly, and ignore block size. On an 80GB drive, continually overwriting that same 1GB file, it will simply cycle through the entire 80GB capacity of the drive repeatedly rather than just hammering the same 1GB section. This means that you suddenly increased the effective lifespan by a factor of 80 (again, not entirely real-world due to the fact that the drive would normally have data filling some of the rest of that 80GB, but sufficient to get the point across). You're now looking at over 25 years of continuous writing, by which time you will have generated 8 yottabytes of data.
That's why wear levelling is a good thing. Even on a disk that's completely full (not something that happens particularly often, but still worth thinking about) the drive itself has some built in excess capacity to use for wear reduction.
Re:Don't SSD's have a pre-set number of writes? by tytso · 2009-02-21 09:49 · Score: 2, Informative

Flash using MLC cells have 10,000 write cycles; flash using SLC cells have 100,000 write cycles, and are much faster from a write perspective. The key is write amplification; if you have a flash device with an 128k erase block size, in the worst case, assuming the dumbest possible SSD controller, each 4k singleton write might require erasing and rewriting a 128k erase block. In that case, you would have a write amplification factor of 32. Intel claims that with their advanced LBA redirection table technology, they have a write amplification of 1.1, with a wear-leveling overhead of 1.4. So if these numbers are to be believed, on average, over time, a 4k write might actually cost a little over 6k of flash write. That is astonishingly good.
The X25-M uses MLC technology, and is rated for a life for 5 years writing 100GB a day. In fact, if you have an 80GB worth of flash, and you write 100GB a day, with an write amplification and wear-leveling overhead of (1.1 and 1.4, respectively), then over 5 years you will have used approximately 3200 write cycles. Given that MLC technology is good for 10,000 write cycles, that means Intel's specification has a factor of 3 safety margin built into them. (Or put another way, the claimed write amplification factors could be three times worse and they would still meet their 100GB/day, 5 year specification.)
And 100GB a day is a lot. Based on my personal usage of web browsing, e-mail and kernel development (multiple kernel compiles a day), I tend to average between 6 and 10GB a day. When Intel surveyed system integrators (i.e., like Dell, HP, et. al), the number they came up with as the maximum amount a "reasonable" user would tend to write in a day was 20GB. 100GB is 10 times my maximum observed write, and 5 times the maximum estimated amount that a typical user might write in a day.
For those of you who are Linux users, you can measure this number yourselves. Just use the iostat command, which will return the number of 512 byte sectors written since the system was booted. Take that number, and divide it by 2097152 (2*1024*1024) to get gigabytes. Then take that number and divide it by the number of days since your system was booted to get your GB/day figure.
Re:Don't SSD's have a pre-set number of writes? by DJRumpy · 2009-02-21 10:15 · Score: 2, Insightful

The TFA would disagree with you, as it states that write performance does indeed drop, sometimes up to half the original performance or more due to wear leveling and write combining techniques used. Your talking read access times, where we're talking write/erase access times.

Re:Still too expensive... by tinkerghost · 2009-02-21 05:12 · Score: 2, Informative

So why should I get a SSD vs. a CF card?

Your CF card is going to use the USB interface which maxes out at about 40Mbps as opposed to using an internal SSD's SATAII interface which maxes at 300Mbps. Not quite an order of magnitude, but close.

On the other hand, if you're going to use an external SSD connected to the USB port, then you wouldn't see any difference between the 2 in terms of speed. Lifespan might be longer w/ the SSD due to better wear leveling, but in either case you're probably going to lose or break it before you get to the fail point.

Re:Still too expensive... by Anonymous Coward · 2009-02-21 05:31 · Score: 5, Informative

A real SSD has several advantages over using CF cards, but not for the reasons you state.

With a simple plug adapter, CF cards can be connected to an IDE interface, so speeds won't be limited by interface speed. The most recent revision of the CF spec adds support for IDE Ultra DMA 133 (133 MB/s)

A couple of additional points, just because I love nitpicking:
- A USB 2.0 mass storage device has a practical maximum speed of around 25 MB/s, not 40 Mb/s.
- The so-called SATA II interface (that name is actually incorrect and is not sanctioned by the standardization body) has a maximum speed of 300 MB/s, not Mb/s.

Another file strategy - file segregation by f(x) by spineboy · 2009-02-21 05:54 · Score: 4, Insightful

Why not functionally group files to decrease or eliminate fragmentation? Or maybe this is already done.
For example - I have a large collection of MP3 files. They essentially do not change, as in I don't edit them, and rarely erase them. The file system could look at they type of file (mp3, vs doc) and place it accordingly. It could also look at the last change in the file and place it in a certain area. Older unchanged files are placed in a tightly placed/packed file area that is optimized and not fragmented.

--
..........FULL STOP.

Re:repeated re-write issues? by A+beautiful+mind · 2009-02-21 06:00 · Score: 4, Informative

There are a few tricks up the manufacturer's sleeve to make this slightly better than it really is:

1. large block size (120k-200k?) means that even if you write 20 bytes, the disk physically writes a lot more. For logfiles and databases (quite common on desktops too, think of index dbs and sqlite in firefox for storing the search history...) where tiny amounts of data are modified, this can add up rapidly. Something writes to the disk once every second? That's 16.5GB / day, even if you're only changing a single byte over and over.

2. Even if the memory cells do not die, due to the large block size, fragmentation will occur (most of the cells will have a small amount of space used in them). There has been a few articles about this that even devices with advanced wear leveling technology like Intel's exhibit a large performance drop (less than half of the read/write performance of a new drive of the same kind) after a few months of normal usage.

3. According to Tomshardware unnamed OEMs told them that all the SSD drives they tested under simulated server workloads got toasted after a few months of testing. Now, I wouldn't necessary consider this accurate or true, but I'd sure as hell would not use SSDs in a serious environment until this is proven false.

--
It takes a man to suffer ignorance and smile
Be yourself no matter what they say

Re:repeated re-write issues? by berend+botje · 2009-02-21 06:07 · Score: 2, Informative

All nice and dandy, but these figures aren't exactly honest. In a normal scenario your filesystem consists for a large part on static data. These blocks/cells are never rewritten. Therefore the writes (for logfiles etc) are concentrated on a small part of the disk, wearing it out rather more quickly.

Having a few Compact Flash disks wear out in the recent past, I'm not exactly anxious to replace my server disks with SSD.

Re:Still too expensive... by couchslug · 2009-02-21 06:34 · Score: 2, Interesting

If it's an older laptop or the mechanical hard disk died, go for it. Addonics make SATA CF adapters so you are not restricted to IDE CF adapters.

--
"This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."

Re:Still too expensive... by karnal · 2009-02-21 06:38 · Score: 2, Informative

Why is this informative? CF with an adapter is NOT USB.

From my experience, using an adapter puts it on the native interface - notably, with CF, it's easiest to put the device into a machine that has a native IDE (not SATA) interface. CF is pin compatible with IDE.

Now, in the current offering of SLC/MLC "drives" you can actually get better read/write since they "raid" for lack of a better term the internal chips. I'm using a transcend ATA-4 CF device that gets around 30MB/sec read/write in a machine in my garage; it's an SLC device that isn't their top of the line, but it was more cost-effective.

So, using the IDE/ATA-4 interface on the CF card, it gets lower CPU utilization than a USB device. Still doesn't hit the 40MB/sec you quoted, but 40MB/sec is a pipe dream on USB in my experience.

--
Karnal

Re:Still too expensive... by Dr.+Ion · 2009-02-21 07:19 · Score: 2, Informative

Your CF card is going to use the USB interface

This is Informative?

CF cards are actually IDE devices. The adapters that plug CF into your IDE bus are just passive wiring.. no protocol adapter needed.

It's trivial to replace a laptop drive with a modern high-density CF card, and sometimes a great thing to do.

The highest-performance CF cards today use UDMA for even higher bandwidth.

HighSpeed USB can't reasonably get over 25MB/sec from the cards using a USB-CF adapter, but you can do better by using its native bus.

Thinkpad X300 came with defrag tools by Britz · 2009-02-21 07:19 · Score: 2, Insightful

I purchased an X300 Thinkpad for the company this week and took a close look at it. I thought expensive business notebooks come without crapware. And I was sure the X300 would be optimized. But they had defrags scheduled! I always thought defrag is a no no for ssds. Now I am not sure anymore. I deinstalled it first. But who knows?

Re:Still too expensive... by pla · 2009-02-21 07:36 · Score: 2, Informative

So why should I get a SSD vs. a CF card?

CF works passably in WORM-like scenarios, where you basically use it in read-only mode and update it rarely and in big chunks. For random R/W access, CF lacks wear leveling to give it a tolerable life expectancy... Thus you commonly see it used in embedded devices such as routers and dumbterms where you may update the firmware or OS every few months; You don't see it used much in real, live writable FSs.

It also tends to have rather poor performance, with reads in the sub-5MB/s range and writes taking forever. So again, using a 32MB CF to boot a router, works great; Using a 32GB CF as the system partition for a modern desktop PC (even with some solution to the limited erase lifetime, such as a UnionFS against a ramdisk with commit-on-shutdown), you can expect 10+ minute boot times.

Organizing by partition by steveha · 2009-02-21 08:33 · Score: 3, Informative

Why not functionally group files to decrease or eliminate fragmentation? Or maybe this is already done.

In a Linux system, this is easily done, but few people bother.

Most of the write activity in Linux is in /tmp, and also in /var (for example, log files live in /var/log). User files go in /home.

So, you can use different partitions, each with its own file system, for /, /tmp, /home, and /var.

The major problem with this is that, if you guess wrong about how big a partition should be, it's a pain to resize things. So my usual thing is just to put /tmp on its own partition, and have a separate partition for / and for /home.

The /tmp partition and swap partition are put at the beginning of the disc, in hopes that seek penalties might be a little lower there. Then / has a generous amount of space, and /home has everything left over.

When a *NIX system runs out of disk space in /tmp, Very Bad Things happen. Far too much software was written in C by people who didn't bother to check error codes; things like disk writes don't fail often, but when /tmp is 100% full, every write fails. A system may act oddly when /tmp is full, without actually crashing or giving you a warning. So, the moral of the story is: disk is cheap, so if you give /tmp its own partition, make it pretty big; I usually use 4 GB now. However, if you run out of disk space in /var, it is not quite as serious. Your system logs stop logging. And, many databases are in /var so you may not be able to insert into your database anymore.

The main Ubuntu installer is fast, because it wipes out the / partition and puts in all new stuff. So, if you have separate partitions for / and /home, life is good: you just let the installer wipe /, and your /home is safely untouched. It's annoying when you have /home as just a subdirectory on / and you want to run the installer. But, by default, the Ubuntu installer will make one big partition for everything; if you want to organize by partitions, you will need to set things up by hand.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

Re:One True File System by ggendel · 2009-02-21 08:41 · Score: 2, Insightful

Although the technology it is used in is repugnant, NTFS has always been the One True Filesystem.

I thought ZFS was.

And ZFS has native support for SSD as L2ARC. http://www.c0t0d0s0.org/media/presentations/ssd.pdf I have nothing but praise for ZFS. Simple to manage, reliable, fast. With native CIFS instead of User file system Samba, I've seen orders of magnitude performance from windows machines when doing networked file access. Gary

Re:Still too expensive... by Mattsson · 2009-02-21 08:46 · Score: 2, Informative

Your CF card is going to use the USB interface which maxes out at about 40Mbps as opposed to using an internal SSD's SATAII interface which maxes at 300Mbps. Not quite an order of magnitude, but close.

There are three factual errors in that statement.
1. CF-cards can be connected directly to the ATA-port via a simple passive connector-adapter and therefor have a theoretical maximum transfer speed of 133MB/s, which roughly translates to 1300Mbps. There's even adapters with room for both a master and slave CF-card in the same shape, size and connector position as a 2.5" ATA drive, specifically made to use CF-cards in laptops.
2. USB is 480Mbps.
3. SATA is 3000Mbps

The big speed-difference between SSD and CF is due to the construction of the devices themselves, not the interface that connects them to the computer.
A fast CF-card can get you around 40MB/s and at the moment they also top out at 32GB sizes and they're not made to handle long term random write operations.
A fast SSD can get you all the way to the theoretical maximum of SATA, around 300MB/s, and are available in much bigger sizes.

--
/.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)

Re:chs no longer used by Hal_Porter · 2009-02-21 08:54 · Score: 2, Informative

CHS disappeared ages ago. The maximum device supported was ~8 Gbyte (1023 cylinders * 255 heads * 63 sectors * 512 bytes)

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;

Re:What is different about SSD's? by tytso · 2009-02-21 09:03 · Score: 4, Informative

Because of this, I imagine that the author would like Linux devs to better support SSD's by getting non-flash file systems to support SSD better than they are today.

Heh. The author is a Linux dev; I'm the ext4 maintainer, and if you read my actual blog posting, you'll see that I gave some practical things that can be done to support SSD's today just by better tuning parameters given to tools like fdisk, pvcreate, mke2fs, etc., and I talked about some of the things I'm thinking about to make ext4 better at support SSD's better than it does today.....

Re:Still too expensive... by drinkypoo · 2009-02-21 09:09 · Score: 2, Insightful

The modern hot-shit high-speed CF cards have wear leveling and do UDMA transfers, you get a CF to ATA adapter, not CF to USB, and they will outperform most hard disks.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:Another file strategy - file segregation by f(x by harry666t · 2009-02-21 11:38 · Score: 3, Interesting

That was my idea when I've proposed an "object storage system" here on /. a few months ago: associate type and metadata with every file, making them more "object-like" (as in object-oriented programming). The storage system would know the behaviour of each object (whether it is likely to grow, or more likely to be modified in place, or probably not modified at all, etc), and would choose the most efficient way of storing every particular kind of data. I've also proposed separate namespaces for each process, capability-based security, dropping paths in favour of non-hierarchical tags, and a few other "revolutionary" ideas that all had only one downside: nobody's going to break backwards compatibility, especially while the current system still "just works".

tytso by r00t · 2009-02-21 12:52 · Score: 2, Informative

"tytso" is Theodore T'so.

He and Remy Card wrote ext2. He and Stephen Tweedie wrote ext3. He and Ming Ming Cao wrote ext4.

He maintains the filesystem repair tool (e2fsck) and resizing tool for those filesystems.

He also created the world's first /dev/random device, maintained the tsx-11.mit.edu Linux archive site for many years, and wrote a chunk of Kerberos. He's been the technical chairman for many Linux-related conferences. He pretty much runs the kernel summit.

He's certainly not a kid. I think he's about to turn 40.

Really, Intel ought to give tytso piles of free SSD hardware before it goes on sale. This would help Intel by encouraging tytso to optimize Linux for Intel's SSD hardware.

Re:destruction is fun too by Cassini2 · 2009-02-21 14:20 · Score: 2, Interesting

So many choices!

This could be fun. Here are some more suggestions:

- Welder - The little chips don't last long against a good arc welder.
- 600 VAC - Why stop at a wall outlet?
- Tesla Coil - 200 kV is better than 600 VAC
- Lightening Rod. Why stop at 200 kV?
- Oxy-acetylene Torch - higher temperatures
- Plasma Cutter - even higher temperatures
- NdYAG Laser - Etch your name into the remains of the flash chip.
- Chew Toy for Dog - Don't underestimate some of those canines, although USB keys might not be good for them.
- Log-Splitting Practice. How good are you at aiming that Axe?
- Place USB in Cement Footings of a building. Do the mob thing.
- Rock crusher
- Grinding Machine
- Wood chipper / pulper
- Cement kiln
- Blast Furnace
- Industrial Press - Terminator Style!

I'm pretty sure that some of these machines can destroy industrial quantities of USB keys, with little difficulty. Cement kilns and rock crushers can destroy just about anything. It would be interesting the see the resulting crushed rock in a piece of cement though. It would be colorful.

Re:1gb /boot? lvm? wtf... by tytso · 2009-02-22 11:05 · Score: 4, Interesting

I use 1GB for /boot because I'm a kernel developer and I end up experimenting with a large number of kernels (yes, on my laptop --- I travel way to much, and a lot of my development time happens while I'm on an airplane). In addition, SystemTap requires compiling kernels with debuginfo enabled, which makes the resulting kernels gargantuan --- it's actually not that uncommon for me to fill my /boot partition and need to garbage collect old kernels. So yes, I really do need a 1GB for /boot.

As far as LVM, of course I use more than a single volume; separate LV's get used for test filesystems (I'm a filesystem developer, remember), but more importantly, the most important reason to use LVM is because it allows you to take snapshots of your live filesystem and then run e2fsck on the snapshot volume --- if the e2fsck is clean you can then drop the snapshot volume, and run "tune2fs -C 0 -T now /dev/XXX" on the file system. This eliminates boot-time fsck's, while still allowing me to make sure the file system is consistent. And because I'm running e2fsck on the snapshot, I can be reading e-mail or browsing the web while the e2fsck is running in the background. LVM is definitely worth the overhead (which isn't that much, in any case).

Slashdot Mirror

Optimizing Linux Systems For Solid State Disks

50 of 207 comments (clear)