New Middleware Promises Dramatically Higher Speeds, Lower Power Draw For SSDs
mrspoonsi (2955715) writes "A breakthrough has been made in SSD technology that could mean drastic performance increases due to the overcoming of one of the major issues in the memory type. Currently, data cannot be directly overwritten onto the NAND chips used in the devices. Files must be written to a clean area of the drive whilst the old area is formatted. This eventually causes fragmented data and lowers the drive's life and performance over time. However, a Japanese team at Chuo University have finally overcome the issue that is as old as the technology itself. Officially unveiled at the 2014 IEEE International Memory Workshop in Taipei, the researchers have written a brand new middleware for the drives that controls how the data is written to and stored on the device. Their new version utilizes what they call a 'logical block address scrambler' which effectively prevents data being written to a new 'page' on the device unless it is absolutely required. Instead, it is placed in a block to be erased and consolidated in the next sweep. This means significantly less behind-the-scenes file copying that results in increased performance from idle."
Could the incoming data be written first in either a RAM or SLC cache while the formatting is going on ?
http://techon.nikkeibp.co.jp/english/NEWS_EN/20140522/353388/?SS=imgview_e&FD=48575398&ad_q
they came up with a better scheme for mapping logical to physical. however, the results aren't as good as all the news sources say.
I don't doubt that the researchers have hit on something interesting, but it's hard to make heads or tails of this article without knowing what algorithms they're comparing it to. The major SSD manufacturers - Intel, Sandforce/LSI, and Samsung - all already use some incredibly complex scheduling algorithms to collate writes and handle garbage collection. At first glance this does not sound significantly different than what is already being done. So it would be useful to know just how the researchers' algorithm compares to modern SSD algorithms in both design and performance. TFA as it stands is incredibly vague.
I was looking into that when I was checking out alternatives to sub-gigabyte hard drives to keep legacy systems ( DOS and the like ) alive.
... but another system ( say an old IBM ThinkPad ) won't recognize it. However a true magnetic drive swaps out nicely - albeit the startup files may need to be changed from one system to another.
Sandisk's CompactFlash memory cards ( intended for professional video cameras ) seemed to make great SSD's for older DOS systems when fitted with a CF to IDE adapter. I can format smaller CF cards to FAT16 ( using the DOS FDISK and FORMAT commands very similar to installing a raw magnetic drive ). With the adapter, the CF card looks and acts like a magnetic rotating hard drive. I had a volley of emails between SanDisk and myself, and the gist of it was they did not advertise using their product in this manner, and they did not want to get involved in support issues, but it should work. They told me they had wear leveling algorithms in place, which was the driving force behind my volley of emails with them. I was very concerned the File Allocation Table area would be very short lived because of the extreme frequency of it being overwritten. I would not like to give my client something that only works for a couple of months - that goes against everything I stand for.
So, I have a couple of SanDisk memories out there in the field on old DOS systems still running legacy industrial robotics... and no problems yet.
Apparently the SanDisk wear-leveling algorithms are working.
I can tell you this works on some systems, but not on others, and I have yet to figure out why. I can even format and have a perfectly operational CF in the adapter plate so it looks ( both physically and supposedly electronically ) like a magnetic IDE drive in one system
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Many industrial computers have CF-card slots for this very application. I put together a few MS-DOS systems using SanDisk CF cards around 8 years ago and they're still going strong, using a variant of one of these cards which has a CF slot built-in (so no need for a CF -> IDE adapter): PCA-6751
In the original-ish article here they go into a bit more detail but the "conventional scheme" they're comparing against appears to be just straight mapping. It would be interesting to see how this stacks up against some of the more advanced schemes employed in today's SSDs.
Fragmentation of data doesn't even affect the speed.
Is this completely true? Because benchmarks show that even SSDs can read larger chunks much faster than small ones. So if a big file exists mostly on adjacent flash cells, it would be faster to read? Of course operating system -level defragmentation might not be very useful because the physical data might be mapped into completely different areas due to wear leveling. Thus the drive would have to perform defragmentation internally.
If data is fragmented over multiple blocks, It requires mulitple reads. But this kind of fragmentation is not as bad as HDD where you had a seek time of 7-8 ms. Matching the block size of the SDD to the block sie of the FS is an effective performance enhancement.
Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block, because that would require hundreds of megabytes of extra data just for the remap pointers, if we assume that they each are 48 bits long. Also TRIM data (which blocks are "nuked" and not just zeroes) requires similar kind of extra space.
The linked article is pretty bad. This link has a little more information. Apparently the saving they claim comes from filling the pages that already have valid data more completely rather than writing to new pages within the same block (the reduced fragmentation claim); then the garbage collector has fewer pages to relocate when erasing that block (the speed-up claim). Of course if the garbage collection happens in the background the savings are moot.
Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block...
IIRC the erase/write block size is typically 128KB.
IBM ThinkPads want both ATA SECURITY and UNLOAD IMMEDIATE. If they don't detect it, they will bitch about it.
It was 128 KB for smaller, older drives. For instance, the Samsung 840 EVO series use an erase block size of 2 MB. Some devices even have an 8 MB erase block size. 8 KB page sizes are common now, too, much like how spinning rust moved to 4 KB pages. Using larger pages and blocks allows for denser, cheaper manufacturing.
Be relentless!
Wear leveling is typically a system by which you write new data to the least-written empty block available, usually with some sort of data-shuffling involved to keep "stagnant" data from preventing wear on otherwise long-occupied sections. It sounds like this is a matter of not erasing the block first: For example if the end of a file has used 60% of a block and is then deleted, the SSD can still use the remaining 40% of the block for something else without first deleting it. Typically, as I understand it, once a block is written that's it until its page is erased - any unused space in a block remains unused for that erase cycle. This technique would allow all the unused bits at the end of the blocks to be reused without an expensive erase cycle, and then when the page is finally ready to be erased all the reused bits on the various blocks can be consolidated to fill a few fresh blocks.
It seems to me this could be a huge advantage for use cases where you have a lot of small writes so that you end up with lots of partially filled blocks. Essentially they've introduced variable-size blocks to the SSD so that one physical block can be reused multiple times before erasure, until all available space has been used. Since erasing is pretty much the slowest and most power-hungry operation on the SSD that translates directly to speed and power-efficiency gains.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Because benchmarks show that even SSDs can read larger chunks much faster than small ones
Well, why shouldn't prefetching and large block reading work on the SSD controller level? I assume that Flash chips are still slower than DRAMs, and the controller has to do some ECC work, not to mention figuring out where to read from (which may also be something that is kept in Flash chips in non-volatile form, unless you want your logical-to-physical mapping completely scrambled when the drive is turned off). So prefetching the data into the controller's memory should help hide latencies even if the Flash chips themselves are true RAM chips, as in the equal random access time to all blocks thingy, and it should work most efficiently when reading operations are performed on larger chunks at once.
Ezekiel 23:20
If you're reading a lot of small files, that's a lot of open/read/close commands. If you're reading a big file, that's one open command, multiple sequential read commands, one close command.
And if it's anything like SPI, there's not even multiple read commands, you just keep clocking to read the data sequentially.
Get free satoshi (Bitcoin) and Dogecoins
I'm not sure about NAND flash, which is a block device, but in NOR flash sequential reads are faster due to prefetching, where the next memory word is read before the CPU has finished processing the first one. For NAND, I'd imagine you could start caching the next page. Not sure if that's actually done, though.
Visit the
"Currently, data cannot be directly overwritten onto the NAND chips used in the devices. Files must be written to a clean area of the drive whilst the old area is formatted"
Am I the only one that knows that's not remotely true? I don't even know where to start. So the SSD wants to write to location 0x00000032 but it's occupied by old data. First of all, no it isn't. TRIM already took care of that. But let's say you're using the SSD in Windows XP so TRIM doesn't work. So they claim the SSD writes data to a blank location on the drive temporarily, then erases the original intended location and later moves it back to that location to be contiguous? What's so damn special about that location? Just leave it in the blank location. They claim that causes fragmentation, which has no impact on the performance of an SSD in any way.
This is a useless invention from people who don't know how SSDs work.
Tracking 4KB blocks wouldn't be that bad for meta data. Like you said, assume 48bit pointers, then some extra metadata, so 64bit, which is 8 bytes. 1GB is 262,144 4KB blocks, which is only about 2MB of metadata per 1GB, which is only 0.2% overhead. They over-provision something like 10%-30% just for wear leveling.
...is that in a properly-designed SSD, there is no such thing as data fragmentation. You lay out the nand as a circular log and write to every bit of it once before you overwrite, and maintain a set of pointers that translates LBA to memory addresses.
Pretty much every SSD vendor out there has figured this out a few years ago.
up 12 days, 22:30, 2 users, load averages: 993.20, 994.21, 994.56
*makes note to limit user processes...
> Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
Please provide reference(s). I have never seen any indication of this, or at least there is no read limit for the flash memory itself. You can read from it indefinitely just like static RAM, without "refresh" as required for DRAM.
Have I stumbled into a new green themed version of LWN? The comments here are far too insightful and interesting for the usual /. fare. Can't even find the frist post.
Google: flash read disturb
The Micron presentation is rather old, but gives a good overview of how Flash works.
That *IS* the basic problem.
However, they have gotten "good enough" for most use cases. Though I agree that it is in the "just barely" category. Limited rewrites are their one major problem at this time. If that can be improved, it would be a great advance for us all.
You're in luck, as that time is right now!
http://techreport.com/review/2...
tl;dr - the Samsung 840 series is the only drive to really suffer problems but that's strictly relatively speaking; it's allocating from reserve capacity and to reach the point it's at now you'd have to have 150 gb of writes per day for 10 years, which is probably at least an order of magnitude higher than even a heavy standard user. And that's the consumer version -- the Intel ssd, aimed more at production / business environments, fares even better. Which mechanical hard drives do you use that support 150 gb of daily writes for 10 years?
Due to circumstances beyond my control, I am master of my fate and captain of my soul.
Most flash drives have some RAM cache and most erasing is done as a background task by the on-board firmware of the drive. Part of flash drive reliability has to do with having big enough capacitors on board so a powerfailure will allow the drive to write enough data to flash to have a consistent state for at least it's own bookkeeping data on blocks and exposed data. The enterprise ones usually have enough capacitors to write all data to flash that has been reported to the OS as "we wrote this to the drive" on top of that.
The big difference here seems to be that they don't erase block level any more and a change to just a few bytes in a block don't lead to the whole block in it's new iteration being written to an empty block and tagging the old block with a "trim". While this is beneficial for throughput, you have to make certain you will not do this indefinitely, since wear level algorithms aren't used for nothing. You'll still need to do a certain percentage of rewrites or keep count of the number of rewrites to the same block and once your counter hits a limit, do a rewrite of the entire block to a "fresh" location.
I was promised a flying car. Where is my flying car?
I've written drivers for solid state media. It is a cost to find the the "next available block" for incoming data. Often, too, it is necessary to copy the original instance of a media block to merge new data with the old. Then, you can toss the old block into a background erase queue, but the copy isn't time-free, either.
Since so-called Smart Media didn't have any blocks dedicated to the logical-physical mapping (It was hidden in a per-physical-block logical id), there was also a startup scan required.
If the middleware is constantly trying to use the same physical block to represent a logical block (something even rotating media is giving up on), the physical block is going to take a pounding if is used for frequently-updated storage. Losing a directory block due to cell damage is not my idea of a good thing.
What I suspect they're really trying to do is reduce the number of blocks dedicated to logical-physical mapping. That lets them ship more parts with from-fab defective blocks at a given capacity out of the die.
Thanks!
I was wondering why my ThinkPads would not see these.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Thank you for the link, CSI! I did not know about that one. It looks like a very handy little board that can retrofit into other ISA systems. ( Yes, I can get desperate enough to fire up Eagle and layout a custom ISA motherboard for something like this if the dying dinosaur is important enough ).
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
I believe Advantech will still happily sell you ISA backplanes. At the same time I put these things together, I had to reverse-engineer and fabricate some old I/O cards which had "unique" (incompatible with readily available cards) interrupt register mappings, also with EAGLE - great software!
I should mention: the MS-DOS system has outlasted three replacement attempts (two windows-based applications were from the original vendor who sold the MS-DOS system). There's just something completely unbreakable about the old stuff.
GP is correct about read disturb. NAND vendors will specify specific policy for a given part, but it is typically N reads to a particular area (i.e. one block, which is 256 or 512 or 1024 pages) then requires erasing that area. So even if page 7 in a block is never read, but page 100 is read a lot, the drive will have to rewrite that whole block eventually.
(I work for a NAND controller vendor.)