New Middleware Promises Dramatically Higher Speeds, Lower Power Draw For SSDs

Wear leveling by Anonymous Coward · 2014-05-23 23:37 · Score: 0

That's a description of wear leveling. Most wear leveling algorithms are proprietary and thus difficult to compare against. Different strategies can be applied, obviously, but in the end every algorithm can only write to empty bits and zero out entire erase blocks at a time. Their algorithm can't get around that. It's just how flash memory works.

Re:Wear leveling by anubi · 2014-05-24 00:14 · Score: 5, Informative

I was looking into that when I was checking out alternatives to sub-gigabyte hard drives to keep legacy systems ( DOS and the like ) alive.

Sandisk's CompactFlash memory cards ( intended for professional video cameras ) seemed to make great SSD's for older DOS systems when fitted with a CF to IDE adapter. I can format smaller CF cards to FAT16 ( using the DOS FDISK and FORMAT commands very similar to installing a raw magnetic drive ). With the adapter, the CF card looks and acts like a magnetic rotating hard drive. I had a volley of emails between SanDisk and myself, and the gist of it was they did not advertise using their product in this manner, and they did not want to get involved in support issues, but it should work. They told me they had wear leveling algorithms in place, which was the driving force behind my volley of emails with them. I was very concerned the File Allocation Table area would be very short lived because of the extreme frequency of it being overwritten. I would not like to give my client something that only works for a couple of months - that goes against everything I stand for.

So, I have a couple of SanDisk memories out there in the field on old DOS systems still running legacy industrial robotics... and no problems yet.

Apparently the SanDisk wear-leveling algorithms are working.

I can tell you this works on some systems, but not on others, and I have yet to figure out why. I can even format and have a perfectly operational CF in the adapter plate so it looks ( both physically and supposedly electronically ) like a magnetic IDE drive in one system ... but another system ( say an old IBM ThinkPad ) won't recognize it. However a true magnetic drive swaps out nicely - albeit the startup files may need to be changed from one system to another.

--
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Re:Wear leveling by csirac · 2014-05-24 00:27 · Score: 4, Informative

Many industrial computers have CF-card slots for this very application. I put together a few MS-DOS systems using SanDisk CF cards around 8 years ago and they're still going strong, using a variant of one of these cards which has a CF slot built-in (so no need for a CF -> IDE adapter): PCA-6751
Re:Wear leveling by Anonymous Coward · 2014-05-24 00:54 · Score: 0

Wear leveling is performed per page (erase block). It's not the writing which kills flash cells, it's the erase "flash". The FTL however deals with smaller blocks, because always writing entire pages (and thus always erasing an entire page before writing any part of it) would result in a huge write amplification and kill the SSD in no time.
Re:Wear leveling by sribe · 2014-05-24 01:17 · Score: 2

Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block...
IIRC the erase/write block size is typically 128KB.
Re:Wear leveling by Anonymous Coward · 2014-05-24 01:42 · Score: 1

IBM ThinkPads want both ATA SECURITY and UNLOAD IMMEDIATE. If they don't detect it, they will bitch about it.
Re:Wear leveling by MarkRose · 2014-05-24 02:02 · Score: 1

It was 128 KB for smaller, older drives. For instance, the Samsung 840 EVO series use an erase block size of 2 MB. Some devices even have an 8 MB erase block size. 8 KB page sizes are common now, too, much like how spinning rust moved to 4 KB pages. Using larger pages and blocks allows for denser, cheaper manufacturing.

--
Be relentless!
Re:Wear leveling by Anonymous Coward · 2014-05-24 02:48 · Score: 0

Ick. Everything is ready for at max 1MiB erase blocks. Really. That's why we align partitions and everything inside (including filesystem structures, LVM data areas, etc) to 1MiB boundaries. This holds true both for Linux and Windows.
Devices with 2MiB erase blocks are not a good idea. Don't get them unless you're going to buy them as throw-away devices (which SSDs are too expensive to be, IMHO).
Re:Wear leveling by Bengie · 2014-05-24 03:27 · Score: 1

Tracking 4KB blocks wouldn't be that bad for meta data. Like you said, assume 48bit pointers, then some extra metadata, so 64bit, which is 8 bytes. 1GB is 262,144 4KB blocks, which is only about 2MB of metadata per 1GB, which is only 0.2% overhead. They over-provision something like 10%-30% just for wear leveling.
Re:Wear leveling by Anonymous Coward · 2014-05-24 05:16 · Score: 0

I am running a lawn watering system using a Raspberry PI, and this technique interests me. For my purposes, it would be great for the people who develop the Raspbian Linux to adopt this scheme in their device drivers.
Re:Wear leveling by Anonymous Coward · 2014-05-24 07:07 · Score: 0

Indeed!
I am developing wearable computing solutions for health related information using the Raspberry Pi and I am interested in this too, because being able to have high bandwidth access to a SSD makes jogging and biking with the device that can get lots of telemetry, for practical intents and purposes, ideal. This solution for Raspbian or Arch would be very high on my list!
Re:Wear leveling by Anonymous Coward · 2014-05-24 07:26 · Score: 0

Apparently the SanDisk wear-leveling algorithms are working.
Not just working, but working damn great !
I've replaced over the years many dozen Cisco ASA 55xx, Cisco 6906's with SUP-720 small default CF-cards with just plain SanDisk Ultra CF 2,4 and 8GB bit larger cards I first purchased from local Camera shop. Oldest have been running since 2006 24/7/365. I've had few extras as spares, but never had to replace any of them.
My home router (Soekris net4801) has been running 24/7/365 since 2005, linux couple of years, from 2008 on pfSense (FreeBSD) and because it's got so little memory (128MB) it doesn't use ramdisk, but instead all writes directly to CF. Have had no problems with it. I thought it would fail after some time and have spare CF, but no indication it will fail before I upgrade device.
Looks like also SanDisk Cruzer Fit USB flash drives are very durable. I've got 16 and 32GB models running many ARM based Linux devices running continuously and used as ordinary physical disks soft-mirrored under VG's and LV's. Most of them now over two years already without any problems doing SLA etc. monitoring.
Great stuff indeed.
Re:Wear leveling by anubi · 2014-05-24 13:24 · Score: 1

Thanks!

I was wondering why my ThinkPads would not see these.

--
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Re:Wear leveling by anubi · 2014-05-24 13:32 · Score: 1

Thank you for the link, CSI! I did not know about that one. It looks like a very handy little board that can retrofit into other ISA systems. ( Yes, I can get desperate enough to fire up Eagle and layout a custom ISA motherboard for something like this if the dying dinosaur is important enough ).

--
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Re:Wear leveling by csirac · 2014-05-24 20:06 · Score: 1

I believe Advantech will still happily sell you ISA backplanes. At the same time I put these things together, I had to reverse-engineer and fabricate some old I/O cards which had "unique" (incompatible with readily available cards) interrupt register mappings, also with EAGLE - great software!
I should mention: the MS-DOS system has outlasted three replacement attempts (two windows-based applications were from the original vendor who sold the MS-DOS system). There's just something completely unbreakable about the old stuff.

Excuse my naiveté by HuguesT · 2014-05-23 23:55 · Score: 2

Could the incoming data be written first in either a RAM or SLC cache while the formatting is going on ?

Re:Excuse my naiveté by Immerman · 2014-05-24 02:17 · Score: 1

It could, but if you're writing large amounts of data (considerably larger than your write cache) that won't actually help much. It also doesn't change the number of erasures required to get the data written, which is the primary speed and power bottleneck.
This technique is sort of like using that blank corner on a piece of scratch paper before you throw it away - the blank spot was there anyway, and by making a habit of reuse you can significantly reduce the number of fresh sheets of paper (erasures) that you need to write the same amount of data. Especially if you consider the fact that, given random data sizes, you are just as likely to have a block with 90% left unused as only 10%.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re:Excuse my naiveté by Bengie · 2014-05-24 03:15 · Score: 1

Only the last block of a file will have a "random" chance of usage.
Re:Excuse my naiveté by Immerman · 2014-05-24 03:46 · Score: 1

Certainly - and if you're typically writing one huge file all at once this will have minimal benefit. But if you're filling the cache with lots and lots of small writes then this technique has potential.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re:Excuse my naiveté by Jane+Q.+Public · 2014-05-24 05:33 · Score: 1

Only the last block of a file will have a "random" chance of usage.
Sure, BUT... blocks on SSDs can be as large or 16k and even larger. That's a lot of wasted space, especially if you have lots of small files.

The real underlying issue here, though, is the number of lifetime write-cycles. Newer SSD technology (MLC in particular) actually made the number smaller, not larger. When it really, really must get larger before SSDs will be mature. That's the central reason why all these workarounds are necessary in the first place. And that's what they are: work-arounds.

Maybe the awkwardly-named memristor or some similar technology will replace it soon. Or many somebody will come up with a way to give cells more write cycles or even "infinite", as magnetic disks basically are. (Yes, I know it is not really infinite, but AFAIK there is no practical limit to write cycles.)
Re:Excuse my naiveté by Anonymous Coward · 2014-05-24 05:37 · Score: 0

Folks, folks...Flash is last year's tech.
Phase Change Memory is the shiznit.
The world need be in awe of yet another over hyped Jew.
Re:Excuse my naiveté by Anonymous Coward · 2014-05-24 22:44 · Score: 0

Or many somebody will come up with a way to give cells more write cycles or even "infinite", as magnetic disks basically are. (Yes, I know it is not really infinite, but AFAIK there is no practical limit to write cycles.)
Not even close to practical. The magnetic disk manufacturers implemented wear leveling back when the drives were in the 200MB-range. Before that disks wore out even quicker than flash disks and I didn't even use swap-files then.
There is a huge difference between unlimited number of writes and undefined number of writes.
In critical applications, a bad number is better than an undefined one. At least you can calculate a life-time and design after that.
Re:Excuse my naiveté by OffTheWallSoccer · 2014-05-26 02:07 · Score: 1

Not even close to practical. The magnetic disk manufacturers implemented wear leveling back when the drives were in the 200MB-range. Before that disks wore out even quicker than flash disks and I didn't even use swap-files then.
There is a huge difference between unlimited number of writes and undefined number of writes.
In critical applications, a bad number is better than an undefined one. At least you can calculate a life-time and design after that.
No, sir. HDDs (at least up until I stopped writing FW for them in 1999) did not have any wear leveling algorithms. In other words, the translation of LBA to physical location on the media (sometimes called Physical Block Address or PBA) is fixed, other than for defective sectors which have been remapped. So if an O.S. wrote to a specific LBA or range of LBAs repeatedly (think paging/swap file or hibernate file), those PBAs would be written to more frequently (or at least at a different rate) than other PBAs across the drive.

crappy journalism as always by Anonymous Coward · 2014-05-23 23:56 · Score: 3, Informative

http://techon.nikkeibp.co.jp/english/NEWS_EN/20140522/353388/?SS=imgview_e&FD=48575398&ad_q

they came up with a better scheme for mapping logical to physical. however, the results aren't as good as all the news sources say.

Compared To What? by rsmith-mac · 2014-05-24 00:00 · Score: 5, Insightful

I don't doubt that the researchers have hit on something interesting, but it's hard to make heads or tails of this article without knowing what algorithms they're comparing it to. The major SSD manufacturers - Intel, Sandforce/LSI, and Samsung - all already use some incredibly complex scheduling algorithms to collate writes and handle garbage collection. At first glance this does not sound significantly different than what is already being done. So it would be useful to know just how the researchers' algorithm compares to modern SSD algorithms in both design and performance. TFA as it stands is incredibly vague.

Re:Compared To What? by Anonymous Coward · 2014-05-24 00:08 · Score: 1

It was tails

"causes fragmented data by Anonymous Coward · 2014-05-24 00:18 · Score: 0

and lowers the drive's life"

what a load of crap.

Fragmented data has nothing to do with the drives life. WRITING data does. The advantage of writing in various areas is SPEED. The fact that "whilst the old area is formatted" is also for SPEED, AND load wear leveling. Erasing data takes longer than writing new data. So erasing is naturally done in the background.

Fragmentation of data doesn't even affect the speed.

Re:"causes fragmented data by jones_supa · 2014-05-24 00:31 · Score: 2

Fragmentation of data doesn't even affect the speed.
Is this completely true? Because benchmarks show that even SSDs can read larger chunks much faster than small ones. So if a big file exists mostly on adjacent flash cells, it would be faster to read? Of course operating system -level defragmentation might not be very useful because the physical data might be mapped into completely different areas due to wear leveling. Thus the drive would have to perform defragmentation internally.
Re:"causes fragmented data by leuk_he · 2014-05-24 00:42 · Score: 2

If data is fragmented over multiple blocks, It requires mulitple reads. But this kind of fragmentation is not as bad as HDD where you had a seek time of 7-8 ms. Matching the block size of the SDD to the block sie of the FS is an effective performance enhancement.
Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
Re:"causes fragmented data by tomhath · 2014-05-24 00:46 · Score: 1

The linked article is pretty bad. This link has a little more information. Apparently the saving they claim comes from filling the pages that already have valid data more completely rather than writing to new pages within the same block (the reduced fragmentation claim); then the garbage collector has fewer pages to relocate when erasing that block (the speed-up claim). Of course if the garbage collection happens in the background the savings are moot.
Re:"causes fragmented data by K.+S.+Kyosuke · 2014-05-24 02:13 · Score: 1

Because benchmarks show that even SSDs can read larger chunks much faster than small ones
Well, why shouldn't prefetching and large block reading work on the SSD controller level? I assume that Flash chips are still slower than DRAMs, and the controller has to do some ECC work, not to mention figuring out where to read from (which may also be something that is kept in Flash chips in non-volatile form, unless you want your logical-to-physical mapping completely scrambled when the drive is turned off). So prefetching the data into the controller's memory should help hide latencies even if the Flash chips themselves are true RAM chips, as in the equal random access time to all blocks thingy, and it should work most efficiently when reading operations are performed on larger chunks at once.

--
Ezekiel 23:20
Re:"causes fragmented data by ArcadeMan · 2014-05-24 02:44 · Score: 1

If you're reading a lot of small files, that's a lot of open/read/close commands. If you're reading a big file, that's one open command, multiple sequential read commands, one close command.
And if it's anything like SPI, there's not even multiple read commands, you just keep clocking to read the data sequentially.

--
Get free satoshi (Bitcoin) and Dogecoins
Re:"causes fragmented data by AdamHaun · 2014-05-24 03:05 · Score: 1

I'm not sure about NAND flash, which is a block device, but in NOR flash sequential reads are faster due to prefetching, where the next memory word is read before the CPU has finished processing the first one. For NAND, I'd imagine you could start caching the next page. Not sure if that's actually done, though.

--
Visit the
Re:"causes fragmented data by gregben · 2014-05-24 03:51 · Score: 1

> Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
Please provide reference(s). I have never seen any indication of this, or at least there is no read limit for the flash memory itself. You can read from it indefinitely just like static RAM, without "refresh" as required for DRAM.
Re:"causes fragmented data by altstadt · 2014-05-24 04:47 · Score: 1

Google: flash read disturb
The Micron presentation is rather old, but gives a good overview of how Flash works.
Re:"causes fragmented data by OffTheWallSoccer · 2014-05-26 02:23 · Score: 1

GP is correct about read disturb. NAND vendors will specify specific policy for a given part, but it is typically N reads to a particular area (i.e. one block, which is 256 or 512 or 1024 pages) then requires erasing that area. So even if page 7 in a block is never read, but page 100 is read a lot, the drive will have to rewrite that whole block eventually.
(I work for a NAND controller vendor.)

Original by GrahamJ · 2014-05-24 00:30 · Score: 2

In the original-ish article here they go into a bit more detail but the "conventional scheme" they're comparing against appears to be just straight mapping. It would be interesting to see how this stacks up against some of the more advanced schemes employed in today's SSDs.

Wear leveling by jones_supa · 2014-05-24 00:43 · Score: 1

Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block, because that would require hundreds of megabytes of extra data just for the remap pointers, if we assume that they each are 48 bits long. Also TRIM data (which blocks are "nuked" and not just zeroes) requires similar kind of extra space.

Certificate expired! by Anonymous Coward · 2014-05-24 01:34 · Score: 0

*.slashdot.org certificate

Issued by Geotrust SSL CA

Expired: Friday, 23 May 2014, 23:49:50 British Summertime.

Either a company that is too stupid to manage their certificates, or someone pretending.www.slashdot.org.

Username is gnasher719, but I'm not logging in to a site that cannot be trusted.

Not wear leveling. by Immerman · 2014-05-24 02:08 · Score: 5, Interesting

Wear leveling is typically a system by which you write new data to the least-written empty block available, usually with some sort of data-shuffling involved to keep "stagnant" data from preventing wear on otherwise long-occupied sections. It sounds like this is a matter of not erasing the block first: For example if the end of a file has used 60% of a block and is then deleted, the SSD can still use the remaining 40% of the block for something else without first deleting it. Typically, as I understand it, once a block is written that's it until its page is erased - any unused space in a block remains unused for that erase cycle. This technique would allow all the unused bits at the end of the blocks to be reused without an expensive erase cycle, and then when the page is finally ready to be erased all the reused bits on the various blocks can be consolidated to fill a few fresh blocks.

It seems to me this could be a huge advantage for use cases where you have a lot of small writes so that you end up with lots of partially filled blocks. Essentially they've introduced variable-size blocks to the SSD so that one physical block can be reused multiple times before erasure, until all available space has been used. Since erasing is pretty much the slowest and most power-hungry operation on the SSD that translates directly to speed and power-efficiency gains.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.

Re:Not wear leveling. by Anonymous Coward · 2014-05-24 03:24 · Score: 0

From RTFS and your post, I have to agree that this is not wear leveling. Instead, they finally got the damn caching algorythim correct and that alone improves the write performance in a similar manner to what's happened with Spinning disks
captcha=shorting
Re:Not wear leveling. by Anonymous Coward · 2014-05-24 03:51 · Score: 0

If thats correct and you can now write in sub-block sizes, this is huge. It used to require a read, combine and write back to the block if you needed to write a partial block.
Re:Not wear leveling. by Anonymous Coward · 2014-05-24 06:15 · Score: 0

That only happens without a flash translation layer (FTL), for example when you want to write to a non-empty page of a memory technology devices (MTDs) in Linux (as can be found in many embedded systems). You can read about the capabilities of (raw) flash devices and how the UBI system works on top of MTDs here. The FTLs in SSDs are proprietary, so there's no easy way to know how current flash controllers implement the FTL.
Re:Not wear leveling. by Guspaz · 2014-05-24 06:39 · Score: 1

You're incorrect. Writes can only happen at the page size, but there are multiple pages per block. If a block has unwritten pages, you can still write to the remaining pages.
Re:Not wear leveling. by Immerman · 2014-05-24 07:44 · Score: 1

You are correct, I got the terms switched in line with the confusion in the summary. Reread with that in mind and I think you'll find the rest is in order (i.e. they are rewriting a partially used page)

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re:Not wear leveling. by Anonymous Coward · 2014-05-24 08:14 · Score: 0

They are not writing to "partially used pages". Flash can't do that. The minimum I/O size is fixed for a given part. They are really just proposing a different write strategy. Maybe their strategy really does work better than established write strategies, maybe it doesn't. With so little information, it's impossible to tell. They are definitely not doing anything that people thought to be impossible. They are not using flash memory in a way it has never been used before.
Re:Not wear leveling. by Immerman · 2014-05-24 08:27 · Score: 1

Are you certain? That sounds like what they're describing, and certainly the individual bits are capable (you're still just setting some of the bits that were reset in the last erase cycle), the rest is just the control hard/software. It's the reset that needs to be handled specially, so long as you are only setting bits that haven't been altered since the last erase there's shouldn't be a problem. It seems to me that, at the crudest, you could simply read a partially filled block, add extra data to the unused (still erased) portion, and then re-write the block without erasing it first. The previously used portions would have the exact same data written to them, to no effect, and the freshly used portions would be updated from their reset status as per normal

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re: Not wear leveling. by Anonymous Coward · 2014-05-24 09:18 · Score: 0

Normally you can write a page at a time, and erase a block of pages on NAND.
This doesn't sound that new, and log based file systems have already been developed to use raw NAND efficiently.
I'm not sure what is new here. Might just be PR.
Re:Not wear leveling. by Anonymous Coward · 2014-05-24 09:25 · Score: 0

I am certain that there are write size limitations in NAND flash. There are checksums. I am not certain that all parts can flip additional bits in an already written-to page without erasing the block first (also consider the potential problems (no pun intended) for MLC flash, but there are at least some SLC parts which can do that. But then you would at least have to read the page before writing it with the additional bits, and one of their selling point is speed, so I am certain that sub-page writes is not what their middleware is about. Page size is small enough for the SSD to provide a spinning disk abstraction without tricks like that. To write in smaller increments than the page size, you would need data that is small enough. But common filesystems already write data in 4K increments, so where are the tiny blocks of data going to come from? To write in smaller increments, you would also need to know that these bits are still unflipped, but any data written to that block was, again, written in larger increments (filesystem block size...), so the controller would have to inspect the data to know that a part of that write left bits usable, and then it would have to mask the bits that are used for another purpose in reads from that block, etc. etc. Yes, I am 100% sure they're not doing that.
Granted, the person who wrote the press release apparently thought that it needed some pizazz to be noteworthy, but is it really so difficult to believe that write strategies, which are as of now mostly proprietary, are an active research topic and that developing write strategies is sufficiently interesting without cunning low level manipulation?
Re:Not wear leveling. by Immerman · 2014-05-24 15:54 · Score: 1

Actually filesystems are typically *allocated* in 4k increments, but not necessarily *written* in such, it's easy enough on a magnetic drive to write only three bytes in the middle of a file, or only the bytes actually used in the last allocation block of each file, though caching systems may obscure that fact.
As for the writing mechanism, you're right, it would likely be a bit more complicated. On further reflection I would suspect that they wouldn't bother reading a block at all, just write the new data with unset bits over all the old data (all zeros? all ones? whatever tells the hardware "leave it in the previous, normally erased, state"). Likely they'd also need to included partial-block checksums as well, but assuming checksums are calculated in firmware that's just an implementation detail. As for knowing how many bits are actually used in a block, I would think that would be easy enough, even just two status-bits per block would tell you whether the last 0,1,2,or 3/4 of the block are unused. Imprecise, but a little under-utilization is unlikely to be a major problem.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Re:Not wear leveling. by Anonymous Coward · 2014-05-25 00:05 · Score: 0

It's not possible to write just three bytes to a magnetic drive. If the storage interface allows such foolishness, a controller closer to the hardware will without a doubt read the corresponding sector, modify it in RAM and write it back. Storage hardware really isn't as simple as you seem to think it is. Abstractions provided to the software layer are, well, abstractions. The bits are not encoded independently in the actual hardware. There are even magnetic drives on the market where writing to a track requires rewriting neighboring tracks!

MOD PARENT UP by Anonymous Coward · 2014-05-24 02:38 · Score: 0

Yes, this is NOT wear leveling. You describe what is happening far better than the crappy article. This technique would mainly be an advantage with lots of small random writes, where "small" is less than half a block size.

Ok, I've RTFA. It is good for the cheaper stuff by Anonymous Coward · 2014-05-24 02:56 · Score: 0

The discovery is a change on the page scheduler of a SSD (where SSD also means the crapier stuff like SD cards and CF cards) done through something you can cheaply make and plug at a stage between the PHY (interface/protocol talks to the computer) and the FTL (flash translation layer that does scheduling and wear levelling).

This is really useful on the cheap stuff, but any modern high-end SSD worth something (Intel, Crucial/Micron, Samsung, Hitachi) already have scheduling algorithms that do better.

OTOH, USB "pendrives", SD cards, CF cards, and their close cousins "eMMC-based disk-on-chip" do _not_ have anything nearly as elaborate for the FTL. For those, this new translation-layer-before-the-translation-layer can really improve things on some workloads.

Heck, the cheaper USB-FTL one-chip solutions used on way too many pendrives cannot even do proper wear-leveling, they only wear-level around 1000 pages or so at the lower LBAs, which makes them unsuitable to anything but FAT16/FAT32 (and you better not repartition them either)...

Not a word of that is true by slashmydots · 2014-05-24 03:23 · Score: 1

"Currently, data cannot be directly overwritten onto the NAND chips used in the devices. Files must be written to a clean area of the drive whilst the old area is formatted"
Am I the only one that knows that's not remotely true? I don't even know where to start. So the SSD wants to write to location 0x00000032 but it's occupied by old data. First of all, no it isn't. TRIM already took care of that. But let's say you're using the SSD in Windows XP so TRIM doesn't work. So they claim the SSD writes data to a blank location on the drive temporarily, then erases the original intended location and later moves it back to that location to be contiguous? What's so damn special about that location? Just leave it in the blank location. They claim that causes fragmentation, which has no impact on the performance of an SSD in any way.

This is a useless invention from people who don't know how SSDs work.

Re:Not a word of that is true by OffTheWallSoccer · 2014-05-26 02:36 · Score: 1

So they claim the SSD writes data to a blank location on the drive temporarily, then erases the original intended location and later moves it back to that location to be contiguous? What's so damn special about that location? Just leave it in the blank location. They claim that causes fragmentation, which has no impact on the performance of an SSD in any way.
This is a useless invention from people who don't know how SSDs work.
You are correct. SSDs don't have a fixed LBA-to-physical arrangement, so host rewrites of an LBA will normally go to a new (erased) NAND location, with the drive updating its internal LBA map automatically (I.e. no need for TRIM of that LBA).

Make them work first by Threni · 2014-05-24 03:46 · Score: 0

How about SSDs that last as long as regular hard drives. Until then, I ain't going near them.

Re:Make them work first by Jane+Q.+Public · 2014-05-24 05:39 · Score: 1

That *IS* the basic problem.

However, they have gotten "good enough" for most use cases. Though I agree that it is in the "just barely" category. Limited rewrites are their one major problem at this time. If that can be improved, it would be a great advance for us all.
Re:Make them work first by regular_gonzalez · 2014-05-24 05:59 · Score: 1

You're in luck, as that time is right now!
http://techreport.com/review/2...

tl;dr - the Samsung 840 series is the only drive to really suffer problems but that's strictly relatively speaking; it's allocating from reserve capacity and to reach the point it's at now you'd have to have 150 gb of writes per day for 10 years, which is probably at least an order of magnitude higher than even a heavy standard user. And that's the consumer version -- the Intel ssd, aimed more at production / business environments, fares even better. Which mechanical hard drives do you use that support 150 gb of daily writes for 10 years?

--
Due to circumstances beyond my control, I am master of my fate and captain of my soul.

The problem with this article... by AcquaCow · 2014-05-24 03:48 · Score: 1

...is that in a properly-designed SSD, there is no such thing as data fragmentation. You lay out the nand as a circular log and write to every bit of it once before you overwrite, and maintain a set of pointers that translates LBA to memory addresses.

Pretty much every SSD vendor out there has figured this out a few years ago.

--

up 12 days, 22:30, 2 users, load averages: 993.20, 994.21, 994.56
*makes note to limit user processes...

Re:The problem with this article... by Anonymous Coward · 2014-05-24 04:16 · Score: 0

How does your circular log magically avoid fragmenting data when you want to write a file larger than the next free block?
Re:The problem with this article... by Bengie · 2014-05-24 04:42 · Score: 1

I may be missing something, but if you have a circular log and the head meets the tail, how can you not start fragmenting to fill the holes in the log? My understanding of circular logs is you just start writing over the oldest data, which you cannot do with permanent storage.
Re:The problem with this article... by tlhIngan · 2014-05-24 05:24 · Score: 1

I may be missing something, but if you have a circular log and the head meets the tail, how can you not start fragmenting to fill the holes in the log? My understanding of circular logs is you just start writing over the oldest data, which you cannot do with permanent storage.
That's where overprovisioning and write-amplification come into play. The head NEVER meets the tail - the circular log is larger than the advertised size. E.g., a 120GB (120,000,000,000 byte) SSD would have 128GiB of flash. That difference is over-provisioning (and even older 128GB SSDs had 128GiB of flash). That overprovisioning accounts for bad blocks (up to 2% of flash is bad when new!), as well as ensuring there is a safe "landing zone" for new data, storage of the FTL tables (the "middleware" the article talks about), etc.
So there is always more physical storage available than exposed, and a periodic thread in the SSD firmware reclaims blocks that have been TRIMmed or overwritten (i.e., marked "dirty") by cleaning up the head and moving the unchanged data to the tail. (You need to move unchanged data too otherwise slowly changing areas of disk will not wear evenly).
The write amplification happens then - because you're causing more data to be written when no writes were issued by the host - writes of the data itself, and writes to the FTL tables to point to the new data location.
Corruption of the FTL tables is serious business - it's the primary cause of SSD failure, and easily repairable too (do an ATA SECURE ERASE forces a reinitialization of the tables putting the SSD back to full operation, at a loss of user data).
The real innovations for SSDs would be to be able to search FTL tables faster, update them safer, and lessen their susceptibility to corruption.
(Modern SSDs are bottlenecked by SATA3, hence the move to PCIe SSDs).

LWN? by JSG · 2014-05-24 04:07 · Score: 1

Have I stumbled into a new green themed version of LWN? The comments here are far too insightful and interesting for the usual /. fare. Can't even find the frist post.

FTL by Anonymous Coward · 2014-05-24 05:58 · Score: 0

OK, I was playing fast and loose with the terminology. Wear leveling (making sure that all flash cells age about equally) is just one function of the FTL (flash translation layer). Other purposes are increased write performance and overall longevity (by not requiring a full read-modify-write cycle for every write) and increased read performance (by distributing data across several flash chips to be read in parallel).

The main point however holds: The story describes a particular write strategy for an FTL. Like other write strategies, it has to balance various competing objectives (e.g. write speed vs. wear). It's not like SSD manufacturers have been sitting on their hands. Intel for example claimed a write amplification of just 1.1 for their enterprise grade SSDs in 2008. If the controller in those SSDs really only ever wrote once to each erase block between erases, a write amplification that low would be unachievable except for highly contiguous accesses.

What this new middleware does (supposedly differently, but with most FTLs being proprietary, it's hard to tell) is write preferentially to blocks which are about to be erased. These writes are essentially free, since it's not the write that kills the cells and the erasing is happening anyway, empty blocks or not. As I wrote before, this is just a particular write strategy, a particular way of using the FTL to schedule writes. Without information about the write strategy against which it is compared, it's difficult to see the benefit of this strategy, if any.

Middleware? by Anonymous Coward · 2014-05-24 07:40 · Score: 0

I guess middleware is another overloaded technical term since I don't see any databases mentioned.

Already being done by dutchwhizzman · 2014-05-24 08:12 · Score: 2

Most flash drives have some RAM cache and most erasing is done as a background task by the on-board firmware of the drive. Part of flash drive reliability has to do with having big enough capacitors on board so a powerfailure will allow the drive to write enough data to flash to have a consistent state for at least it's own bookkeeping data on blocks and exposed data. The enterprise ones usually have enough capacitors to write all data to flash that has been reported to the OS as "we wrote this to the drive" on top of that.

The big difference here seems to be that they don't erase block level any more and a change to just a few bytes in a block don't lead to the whole block in it's new iteration being written to an empty block and tagging the old block with a "trim". While this is beneficial for throughput, you have to make certain you will not do this indefinitely, since wear level algorithms aren't used for nothing. You'll still need to do a certain percentage of rewrites or keep count of the number of rewrites to the same block and once your counter hits a limit, do a rewrite of the entire block to a "fresh" location.

--
I was promised a flying car. Where is my flying car?

bad idea - thrashing directory blocks by dltaylor · 2014-05-24 11:43 · Score: 1

I've written drivers for solid state media. It is a cost to find the the "next available block" for incoming data. Often, too, it is necessary to copy the original instance of a media block to merge new data with the old. Then, you can toss the old block into a background erase queue, but the copy isn't time-free, either.

Since so-called Smart Media didn't have any blocks dedicated to the logical-physical mapping (It was hidden in a per-physical-block logical id), there was also a startup scan required.

If the middleware is constantly trying to use the same physical block to represent a logical block (something even rotating media is giving up on), the physical block is going to take a pounding if is used for frequently-updated storage. Losing a directory block due to cell damage is not my idea of a good thing.

What I suspect they're really trying to do is reduce the number of blocks dedicated to logical-physical mapping. That lets them ship more parts with from-fab defective blocks at a given capacity out of the die.

Slashdot Mirror

New Middleware Promises Dramatically Higher Speeds, Lower Power Draw For SSDs

68 comments