With Optane Memory, Intel Claims To Make Hard Drives Faster Than SSDs (pcworld.com)
SSDs are generally faster than hard drives. However, they are also usually more expensive. Intel wants to change that with its new Optane Memory lineup, which it claims is faster and better performing than SSDs while not requiring customers to break their banks. From a report on PCWorld: Announced Monday morning, these first consumer Optane-based devices will be available April 24 in two M.2 trims: A 16GB model for $44 and a 32GB Optane Memory device for $77. Both are rated for crazy-fast read speeds of 1.2GBps and writes of 280MBps. [...] When the price of a 128GB SATA SSD is roughly $50 to $60 today, you may rightly wonder why Optane Memory would be worth the bother. Intel says most consumers just don't want to give up the capacity for their photos and videos. PC configurations with a hard drive and an SSD, while standard for higher-end PC users, isn't popular for the newbies. Think of the times you've had friends or family fill up the boot drive with cat pictures, but the secondary drive is nearly empty. Intel Optane Memory would give that mainstream user the same or better performance as an SSD, with the capacity advantage of the 1TB or 2TB drive they're used to. Intel claims Optane Memory performance is as good or better than an SSD's, offering better latency by magnitudes and the ability to peak at much lower queue depths.
Can wouldn't SSDs be more energy efficient?
So these high-priced, low-capacity drives are meant to fill the need for low-priced, high-capacity drives?
Shouldn't the summary at least attempt to fill in the gaps here?
Smoke. Total and complete nonsense. Why would I want to buy their over-priced octane junk verses a Samsung 951* or 960* NVMe drive? Far more storage for around $115-$130, 1.4 GBytes/sec consistent read performance, decent write performance, and decent durability.
P.S. the Intel 600P NVMe drive is also horrid, don't buy it.
http://apollo.backplane.com/DF...
-Matt
The way Intel plans on using Optane memory, yes it will most certainly improve the speed of HDs by caching but to say it will always outperform an SSD is an outright lie. For starters if you're working with unusually large datasets it likely won't all fit in Optane memory and unless your cache is highly intelligent and can read ahead, it's likely that things will load slowly on the first attempt. Then for laptops there's also the bonus of not destroying the HD if your laptop gets bumped in the wrong way or treated with a bit of abuse when operating. If this worked so well then Seagate's hybrid SSD / HD drives should be almost everything but it isn't.
They are saying that SSD cache of HDD is rare because most people only have one device, but somehow by being more expensive per GB, this has a better chance of being a common configuration? This pitch is sufficiently convoluted I can't help but to wonder how worried/challenged they must be to find a wider market for the technology, given the price point.
This seems to be an unfortunate reality of PC storage, the vast majority of the market is entrenched in 'good enough'. Even NVMe is a relative rarity, despite getting more performance out of NAND SSD than SATA connection. A bump for the general order of magnitude improvement that is NAND.
A better angle could be to replace additional memory capacity (sometimes padded out for more disk cache) with an Optane, but even then most desktops seem 'fine' at 4GB of ram. This *is* much cheaper than ram, and probably fast enough so that we don't *need* to cache to ram, so that might not be so bad.
XML is like violence. If it doesn't solve the problem, use more.
Intel is marketing the Optane Memory M.2 modules as caches for hard drives.
"Lather, rinse, repeat. With each duplicate task, the launching speed accelerated. The load time for Gimp, for example, dropped from about 14 seconds to 8 seconds, and then to 3 or 4 seconds as the Optane Memory cached the task."
That's only speeding up accesses for repeated tasks (which, granted, there are many of).
I think the problem Intel found is that Optane memory is too expensive right now in larger sizes. They came up with this cache module as their best way to market it. Is someone really going to spent $77 for a 32GB cache device when they can just spend $99 for a 256GB SSD?
To bad that intels pci-e lanes suck on there desktop cups.
AMD has X16 or X8 X8 (video) + X4 (storage) + USB 3.X on die + X4 chip set link VS intel with X16 or X8 X8 (video) + X4 chipset link.
You're doing it wrong. Rather than looking for a good shot at just the right moment, you shoot lots of pictures hoping at least one out of the hundred looks decent. And you keep the unused 99 others around because you're too lazy to erase them all.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Having a hard time imagining the use case for this.
For consumer gear, almost any SSD sold today will be faster than someone would ever need. Just use that as a cache and save some money.
For pro/enthusiast gear, money would probably be better invested simply getting more RAM -- with 32GB, in many cases I have 20GB or more of that being used as a filesystem cache. Cache tends to very rapidly exhibit diminishing returns, to the point where I doubt I'd even notice an extra 32GB sandwiched between my RAM and SSD.
Maybe as a non-volatile cache for large bursts of writes?
They might fill the need, but until then their R&D costs need to be driven in. So lets look forward to a few years or so when the people who believed this marketing crap bought those devices and by that made them cheaper.
Optane is Intel's name for 3D Xpoint storage. Right now, it's more expensive than NAND storage, and is only available in smaller capacities. That is why they are using it as cache on conventional hard drives. When it becomes cheaper to produce, and in higher capacities, it's going to be great. It will be way faster than NAND, and you won't have to worry about wear-levelling because it doesn't suffer from insulator breakdown.
Yeah, it is not clear from the summary, reading it I thought it was about hybrid drives, but the sizes don't make sense.
So, these are M.2 expansion cards which offer a big and very fast cache for your existing hard drive.
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
That's about my wife. She will take about 20 photos of the exact same shot from the exact same angle to try and get the best picture and not delete a single one.
I, on the other hand will take three photos from different angles- and then more often than not, I will delete all three photos.
"That's the way to do it" - Punch
All new storage technologies start with a significant price premium vs established technology.
$77 and 32GB is not intended for photos and videos (which is all consumers think about), they're intended for servers which need high speed but not a great deal of storage space per drive. $2 per GB is roughly what we saw with SSD when they first came out.
For someone running a home server, these drives are a feasible replacement for their existing database and web storage to get much better performance.
For commercial providers, these are going to start replacing SSDs in RAID arrays as the capacities go up.
Work Safe Porn
A lot of products flat out fail trying to recover R&D expenses. I am not saying this is one of those, as Intel has huge resources behind any tech it brings to market.
The idea here (in the long run), is that Drives and "memory" become the same space. Instant on, fast access to Nonvolatile RAM, and RAM becomes equivalent to 4 tier processor cache.
I've long predicted that memory space is going to be flattened out and everything is going to be mapped as one big logical drive, measured in access speed to data that is frequently needed. Closer / Faster, Further / Slower
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
With 64 bit memory addresses, there's no need to differentiate memory vs drive space. Just let the swap manager decide what goes where in the physical world, and each process gets its own dedicated pages of a single memory space.
The deal is they have a bunch of half-broken XPoint shit they need to sell off in some form to recoup some $.
XPoint (Currently "Optane" products from Intel) isn't fucking ready: http://semiaccurate.com/2016/0...
If Intel & Micron can get to the point where it fucking works as planned then it'll be great. But who the fuck knows if/when that'll actually happen. What you're seeing now is a broken mess that is shippable only because they're loading it up with tons of redundancy / overprovisioning for when it fails, and it works only at about the same speed as a high end SSD.
So far having solid state cache for a hard drive is an idea which looks great on a paper, but practically everything that has been offered shows the performance - and we're talking about the real workload and the real user experiences - closer to the hard drive than to the solid state device. IMHO, since, apparently, we have a fairly large number of cache misses or some other anomalies, having the solid state cache which is 1000 faster than the traditional NAN-based one won't make too much difference.
On the other hand, having the solid state device which only 10 times slower than DDR would make it excellent virtual storage. you can put 64GB of DDR4 on your server and then get 350GB slab of Optane. For all practical purposes you have 350GB of main memory. Swapping the working sets in and out would happen, for all practical purposes, instantly. But of course that's solution for data center, not for the regular user.
Those people are turing their cameras on, more often than you do.
HTH.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
I think we're *eventually* going to wind up with a unified memory technology that flattens the memory space, but I don't think Optane is it.
When this was first a thing, the Optane access times were a couple of orders of magnitude off RAM. It really read like a newer/better/faster version of existing flash storage media. Of course the critical thing is "Can you make it price competitive with existing NAND?"
If they can't, it's going to be a tough sell. Existing NAND storage has gotten to be fast, durable, cheap and is growing in capacity. While you *can* use faster storage in front of slower capacity storage as a cache, existing NAND is so cheap now that everything is migrating to flash.
Caching works, but it's complex and has overhead penalties, which is one reason why all flash storage has grown in popularity. The consumer wants one drive, not two, and even the enterprise wants speed and simplicity.
I'm curious what Intel's problem is. Is it just an early production capacity problem or are their yield problems? Or did they drink their own kool-aide and think that people wanted to step back to multi-tier storage for their new cache chips?
More and more memory will be moved on die also. 50 years from now, we'll probably just have a single die that is the computer..
You're already modded 5 for this, but you deserve extra bonus mod points.
She will take about 20 photos of the exact same shot from the exact same angle to try and get the best picture and not delete a single one.
I've done that but usually with a tripod mounted camera but there it isn't to pick the best one. When I do that I am planning on combining them and doing things like focus stacking, HDR, or super resolution photography or a combination of them. For film I will also scan the negatives multiple times as well and combine them to reduce the noise and also produce images closer to the advertised resolution of the film scanner than can otherwise be achieved. Yes I have some photographs where I am getting 60-70 megapixels of actual data off of 35mm film but that requires having very fine film (50 speed Ilford B&W, 100 speed Kodak Ektar), a camera on a tripod, and the use of a high quality lens (my worst one diffraction wise is an f/4.5 500mm the best is an f/1.4 50mm one), and shooting with a wide open aperture. I love my full frame picture of Dome of the Rock that I took from on top of Mount of the Olives where I can clearly see the Islamic Calligraphy in the mosaic work on the outside with nice crisp lines.
Time to offend someone
It would depend on the relative latency and other characteristics. XPoint is definitely not it, because XPoint can't handle unlimited writing. But in some future lets say we do have a non-volatile storage mechanic that has effectively unlimited durability, like ram, but which is significantly more dense, like XPoint.
In that situation I can see systems supporting a chunk of that sort of storage as if it were memory.
Latency matters greatly here for several reasons. First, I don't think XPoint is quite fast enough, at least not yet. The problem with any sort of high-latency storage being treated like memory at the HARDWARE level is because that latency creates massive stalls on the cpu. DRAM today causes huge many-clock stalls on a cpu. These stalls are transparent to the operating system, so the operating system cannot just switch to another thread or do other work during the stall. The stall effectively reduces the performance of the system. This is the #1 problem with treating any sort of storage technology as if it were memory.
The #2 problem is that memory is far easier to corrupt than storage (which requires a block transaction to write). I would never want to map my filesystem entire storage's block device directly into memory, for example. It's just too dangerous.
The solution that exists today is, of course, swap space. You simply configure your swap on an SSD. The latencies are obviously much higher than they would be for a HW XPoint style solution, around 50-100uS to take a page-fault requiring I/O from a NVMe SSD, for example.
The difference though is that the operating system knows that it is taking the page-fault and can switch to another runnable thread in the mean time, so the CPU is not stalled for 50-100uS. It's doing other work. Given enough pending work, the practical overhead of a page-fault in terms of lost CPU time is only around 2-4uS.
In a XPoint-like all-hardware solution, the CPU will stall on the miss. If the XPoint 'pagein' time is 1-2uS, then the all-hardware solution winds up only being twice as good as the swap space solution in terms of CPU cycles. Of course, the all-hardware solution will be far better in terms of latency (1-2uS verses 50-100uS).
But to really work in this format the non-volatile memory needs to have a nearly unlimited write capability. XPoint does not. XPoint only has around 33,000 write cycles of durability per cell (and that's being generous). It needs to be half a million at a minimum and at least 10 million to *really* be useful.
-Matt
DDR3-1600 RAM runs at 12.8GB/s. If we wanted to read a 1.2GB/s couldn't we have a RAM chip, some fancy logic, and a delay line. That is, continuously clock the RAM contents around the delay line and then wait for it to come back in when you want to read it out.
Come to think of it, that just adds read latency, once your patch of delay line comes around you can read it at 12.8GB/s.
probably costs a ton of power, and of course it's volatile, but if 9/10ths of the memory is on the bus you get a lot of value for the RAM.
Nullius in verba
This is pretty much how computers used to be. Just a flat memory space and that's it. Lots of early computers ran OS out of rom and all user data was stored in RAM. Cartrige based game systems simply map the cartridge rom in to memory space. Before cheap flash storage became available early Palms and Windows CE devices stored user data and installed programs in battery backed DRAM - And even had user-added programs specially compiled so they could be executed in place (since they were already stored in fast DRAM)
Block devices, file systems, etc were originally devised because it became clear that there was a need for less expensive, nonvolitle, portable storage. Computers, however, can only work on data that is in it's mapped memory space. Schemes to copy data in and out of memory from slower/cheaper/portable storage have always been kind of a hack.
This!
My first thought was exactly this. You can have a Samsung 960 EVO, that is three times faster in read and over five times faster in write speeds for only twice the money of that Intel module. And it has a capacity of 250 GB, not 32 GB. If Samsung would make a 960 EVO 128GB model, the entire Intel product line would be dead in the water. Oh, wait. They have, somewhat... the SM961 128GB, which is both faster and about as expensive as 32 Intel GBs.
Sorry Intel, and thanx for the deja-vu moment, for my second thought was: 'Oh, my god, this is Intel Turbo Memory / Robson Modules (tm) all over again!"
You know, the tech they said would reach the market in 2016, then late 2016, then December 2016, then early 2017, and still doesn't show up in shopping.google.com today. When you miss your announced release dates that often, I guess the MO is to change the name and hope nobody notices.
That's how the AS400 works, single flat address space, every object with a permanent globally unique pointer, auto loaded on reference.
Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').
XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.
So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.
This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.
Here's the 960 EVO and some randread tests on it at QD1 and QD4.
nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2
nvme1: mapped 8 MSIX IRQs
nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Interrupt Coalesce: 100uS / 4 qentries
nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1
(/dev/nvme1s1b is a partition filled with uncompressible data)
xeon126# randread /dev/nvme1s1b 4096 100 1 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 1
device
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
11458/s avg= 87.28uS bw=46.92 MB/s lo=68.50uS, hi=154.20uS stddev=7.01uS
11469/s avg= 87.19uS bw=46.98 MB/s lo=69.97uS, hi=151.97uS stddev=6.95uS
11477/s avg= 87.13uS bw=47.01 MB/s lo=69.31uS, hi=158.03uS stddev=7.03uS
And here is QD4 (really QD1 x 4 threads on 4 HW queues):
xeon126# randread /dev/nvme1s1b 4096 100 4 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 4
device
44084/s avg= 90.74uS bw=180.57MB/s lo=65.17uS, hi=237.92uS stddev=16.94uS
44205/s avg= 90.49uS bw=181.05MB/s lo=65.38uS, hi=222.21uS stddev=16.56uS
44202/s avg= 90.49uS bw=181.04MB/s lo=65.19uS, hi=221.48uS stddev=16.72uS
44131/s avg= 90.64uS bw=180.75MB/s lo=64.44uS, hi=245.91uS stddev=16.81uS
44210/s avg= 90.48uS bw=181.08MB/s lo=63.73uS, hi=232.05uS stddev=16.74uS
So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:
xeon126# randread /dev/nvme1s1b 32768 100 4 /dev/nvme1s1b bufsize 32768 limit 16.000GB nprocs 4
device
19997/s avg=200.03uS bw=655.26MB/s lo=125.02uS, hi=503.26uS stddev=55.24uS
20090/s avg=199.10uS bw=658.23MB/s lo=124.62uS, hi=522.04uS stddev=54.83uS
20034/s avg=199.66uS bw=656.47MB/s lo=123.63uS, hi=495.74uS stddev=55.59uS
20008/s avg=199.92uS bw=655.62MB/s lo=123.50uS, hi=500.24uS stddev=55.92uS
20034/s avg=199.66uS bw=656.47MB/s lo=125.17uS, hi=488.30uS stddev=55.02uS
20000/s avg=200.00uS bw=655.35MB/s lo=123.19uS, hi=504.18uS stddev=55.98uS
And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:
xeon126# randread /dev/nvme1s1b 32768 100 64 /dev/nvme1s1
device
Thanx for the numbers :) looks quite interesting, especially because I'm in the process of buying a fast SSD soon (new PC setup replacing my 7 year old Phenom and the motherboard will most probably have a PCI-e 3.x 4x M.2 slot). Latency increases with block size... but when you're going for bulk data, latency gets less important, I think. It's the commands for very small bits of data, I suppose, you want to have with as little latency as possible. At 'various levels' of copying my experience (just gut feeling, no numbers here) it's the small files that take up the most time. Whether at PC internal storage level (copying a directory with random files, it flies through the first 75% of relatively large files, then takes 90% of the time to copy all the a couple of KB/file junk), when using databases or at the network level (don't get me started on SMB overhead), whatever. Either overhead, or a (relatively) larger part of the execution time is latency, or both...
So, if the Optane has that insane low 'average' latency of 10 uS, do you think Intel has measured that at the optimum read blocksize (and as such it is an average over random positions in the NVRam you read from) or do they mean, with a 'typical' load of random blocksizes you get on average 10uS latency before the CPU can process the data... well, we'll see when people get their hands on them and can benchmark them.
I'm also very 'curious' if Optane will indeed beat a similarly priced (but obviously larger in volume) PCIe SSD as a HDD cache in actual real-world desktop circumstances (including using it as 'swapfile' if you want). And if it makes a difference that is actually noticable for user experience. Of course the guaranteed durability for cell-writes is nice, but that will be (partly?) negated by the smaller storage volume of the device. Also advertized durability doesn't indicate actual durability. I know of SSD tests where cheap (incedently Samsung) SSDs can handle way more writes than advertised (where the benchmarker had to break off their testing after 100s of times the advertized writes - else they'd miss the publishing deadline of their article) where other SSDs barely hit their mark and then died completely.
The idea here (in the long run), is that Drives and "memory" become the same space. Instant on, fast access to Nonvolatile RAM, and RAM becomes equivalent to 4 tier processor cache.
This idea terrifies me. Currently, a reboot fixes everything but hardware issues. Once this goes live, only reinstalling from scratch will fix things.
"Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
More and more memory will be moved on die also. 50 years from now, we'll probably just have a single die that is the computer..
No for two reasons:
1. Compare the amount of die area that the DRAM takes in a system with a reasonable amount of memory. It is way too much to be integrated with the CPU die.
2. High performance logic and bulk DRAM processes are different. Also operating the DRAM at the temperature of the CPU is a problem although acceptable in some cases.
The closest you may get is integrating the DRAM as part of a hybrid or multichip module however this will only work for systems with low memory requirements. GPUs are starting to go this way.
When this was first a thing, the Optane access times were a couple of orders of magnitude off RAM.
Optane access times are still too slow to replace DRAM.
While you *can* use faster storage in front of slower capacity storage as a cache, existing NAND is so cheap now that everything is migrating to flash.
Caching works, but it's complex and has overhead penalties, which is one reason why all flash storage has grown in popularity. The consumer wants one drive, not two, and even the enterprise wants speed and simplicity.
I'm curious what Intel's problem is.
Access times on Optane are such that these drives can support their maximum throughput at low queue depths unlike NAND Flash which requires a large number of queued transactions. In this respect, Optane requires *less* caching and buffering than NAND and apparently less processing in its translation tables. Is that enough? I do not know.
As a form of slow (but faster and lower latency than NAND Flash) non-volitile RAM (random access memory) in the traditional sense which NAND Flash is not and never will be, maybe that is enough if it is attached to a CPU's auxiliary memory bus instead of a host adapter bus like NAND Flash.
Dissecting the test output:
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
That means the average latency is 85uS (averaged over all reads), the lowest latency measured was 66uS and the highest was 140uS. Another important metric is the standard deviation... that is, how 'tight' access times are around that average latency of 85uS. In this case, a standard deviation of 7.5uS is very good.
Comparing this to the Optane. what Intel has stated is that the average latency over all reads for Optane NVMe will be around 10uS. They also stated that the standard deviation would be much tighter. So that is comparative.
But here's the real problem... you ask whether Optane will beat a PCIe SSD as a HDD cache in actual real-world desktop circumstances. I will add 'at the same price point'. The answer to that is going to be 'no'. The reason is that you can buy 4x to 8x the amount of NAND NVMe-based storage as you can Optaane NVMe storage for the same price.
So instead of having a 32G Octane cache, you could have a 128GB-256GB NAND SSD cache for the same price. That *completely* trumps Octane, no matter how low Octane's latency is, for this use case.
-Matt