With Optane Memory, Intel Claims To Make Hard Drives Faster Than SSDs (pcworld.com)
SSDs are generally faster than hard drives. However, they are also usually more expensive. Intel wants to change that with its new Optane Memory lineup, which it claims is faster and better performing than SSDs while not requiring customers to break their banks. From a report on PCWorld: Announced Monday morning, these first consumer Optane-based devices will be available April 24 in two M.2 trims: A 16GB model for $44 and a 32GB Optane Memory device for $77. Both are rated for crazy-fast read speeds of 1.2GBps and writes of 280MBps. [...] When the price of a 128GB SATA SSD is roughly $50 to $60 today, you may rightly wonder why Optane Memory would be worth the bother. Intel says most consumers just don't want to give up the capacity for their photos and videos. PC configurations with a hard drive and an SSD, while standard for higher-end PC users, isn't popular for the newbies. Think of the times you've had friends or family fill up the boot drive with cat pictures, but the secondary drive is nearly empty. Intel Optane Memory would give that mainstream user the same or better performance as an SSD, with the capacity advantage of the 1TB or 2TB drive they're used to. Intel claims Optane Memory performance is as good or better than an SSD's, offering better latency by magnitudes and the ability to peak at much lower queue depths.
Can wouldn't indeed!
They can wouldn't be more as even such as many though.
So these high-priced, low-capacity drives are meant to fill the need for low-priced, high-capacity drives?
Shouldn't the summary at least attempt to fill in the gaps here?
Smoke. Total and complete nonsense. Why would I want to buy their over-priced octane junk verses a Samsung 951* or 960* NVMe drive? Far more storage for around $115-$130, 1.4 GBytes/sec consistent read performance, decent write performance, and decent durability.
P.S. the Intel 600P NVMe drive is also horrid, don't buy it.
http://apollo.backplane.com/DF...
-Matt
That's highly dependent on the woodening process.
The way Intel plans on using Optane memory, yes it will most certainly improve the speed of HDs by caching but to say it will always outperform an SSD is an outright lie. For starters if you're working with unusually large datasets it likely won't all fit in Optane memory and unless your cache is highly intelligent and can read ahead, it's likely that things will load slowly on the first attempt. Then for laptops there's also the bonus of not destroying the HD if your laptop gets bumped in the wrong way or treated with a bit of abuse when operating. If this worked so well then Seagate's hybrid SSD / HD drives should be almost everything but it isn't.
Intel is marketing the Optane Memory M.2 modules as caches for hard drives.
"Lather, rinse, repeat. With each duplicate task, the launching speed accelerated. The load time for Gimp, for example, dropped from about 14 seconds to 8 seconds, and then to 3 or 4 seconds as the Optane Memory cached the task."
That's only speeding up accesses for repeated tasks (which, granted, there are many of).
I think the problem Intel found is that Optane memory is too expensive right now in larger sizes. They came up with this cache module as their best way to market it. Is someone really going to spent $77 for a 32GB cache device when they can just spend $99 for a 256GB SSD?
You're doing it wrong. Rather than looking for a good shot at just the right moment, you shoot lots of pictures hoping at least one out of the hundred looks decent. And you keep the unused 99 others around because you're too lazy to erase them all.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Optane is Intel's name for 3D Xpoint storage. Right now, it's more expensive than NAND storage, and is only available in smaller capacities. That is why they are using it as cache on conventional hard drives. When it becomes cheaper to produce, and in higher capacities, it's going to be great. It will be way faster than NAND, and you won't have to worry about wear-levelling because it doesn't suffer from insulator breakdown.
Yeah, it is not clear from the summary, reading it I thought it was about hybrid drives, but the sizes don't make sense.
So, these are M.2 expansion cards which offer a big and very fast cache for your existing hard drive.
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
A lot of products flat out fail trying to recover R&D expenses. I am not saying this is one of those, as Intel has huge resources behind any tech it brings to market.
The idea here (in the long run), is that Drives and "memory" become the same space. Instant on, fast access to Nonvolatile RAM, and RAM becomes equivalent to 4 tier processor cache.
I've long predicted that memory space is going to be flattened out and everything is going to be mapped as one big logical drive, measured in access speed to data that is frequently needed. Closer / Faster, Further / Slower
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
With 64 bit memory addresses, there's no need to differentiate memory vs drive space. Just let the swap manager decide what goes where in the physical world, and each process gets its own dedicated pages of a single memory space.
No can us are belong to bases all your
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
So far having solid state cache for a hard drive is an idea which looks great on a paper, but practically everything that has been offered shows the performance - and we're talking about the real workload and the real user experiences - closer to the hard drive than to the solid state device. IMHO, since, apparently, we have a fairly large number of cache misses or some other anomalies, having the solid state cache which is 1000 faster than the traditional NAN-based one won't make too much difference.
On the other hand, having the solid state device which only 10 times slower than DDR would make it excellent virtual storage. you can put 64GB of DDR4 on your server and then get 350GB slab of Optane. For all practical purposes you have 350GB of main memory. Swapping the working sets in and out would happen, for all practical purposes, instantly. But of course that's solution for data center, not for the regular user.
I think we're *eventually* going to wind up with a unified memory technology that flattens the memory space, but I don't think Optane is it.
When this was first a thing, the Optane access times were a couple of orders of magnitude off RAM. It really read like a newer/better/faster version of existing flash storage media. Of course the critical thing is "Can you make it price competitive with existing NAND?"
If they can't, it's going to be a tough sell. Existing NAND storage has gotten to be fast, durable, cheap and is growing in capacity. While you *can* use faster storage in front of slower capacity storage as a cache, existing NAND is so cheap now that everything is migrating to flash.
Caching works, but it's complex and has overhead penalties, which is one reason why all flash storage has grown in popularity. The consumer wants one drive, not two, and even the enterprise wants speed and simplicity.
I'm curious what Intel's problem is. Is it just an early production capacity problem or are their yield problems? Or did they drink their own kool-aide and think that people wanted to step back to multi-tier storage for their new cache chips?
It would depend on the relative latency and other characteristics. XPoint is definitely not it, because XPoint can't handle unlimited writing. But in some future lets say we do have a non-volatile storage mechanic that has effectively unlimited durability, like ram, but which is significantly more dense, like XPoint.
In that situation I can see systems supporting a chunk of that sort of storage as if it were memory.
Latency matters greatly here for several reasons. First, I don't think XPoint is quite fast enough, at least not yet. The problem with any sort of high-latency storage being treated like memory at the HARDWARE level is because that latency creates massive stalls on the cpu. DRAM today causes huge many-clock stalls on a cpu. These stalls are transparent to the operating system, so the operating system cannot just switch to another thread or do other work during the stall. The stall effectively reduces the performance of the system. This is the #1 problem with treating any sort of storage technology as if it were memory.
The #2 problem is that memory is far easier to corrupt than storage (which requires a block transaction to write). I would never want to map my filesystem entire storage's block device directly into memory, for example. It's just too dangerous.
The solution that exists today is, of course, swap space. You simply configure your swap on an SSD. The latencies are obviously much higher than they would be for a HW XPoint style solution, around 50-100uS to take a page-fault requiring I/O from a NVMe SSD, for example.
The difference though is that the operating system knows that it is taking the page-fault and can switch to another runnable thread in the mean time, so the CPU is not stalled for 50-100uS. It's doing other work. Given enough pending work, the practical overhead of a page-fault in terms of lost CPU time is only around 2-4uS.
In a XPoint-like all-hardware solution, the CPU will stall on the miss. If the XPoint 'pagein' time is 1-2uS, then the all-hardware solution winds up only being twice as good as the swap space solution in terms of CPU cycles. Of course, the all-hardware solution will be far better in terms of latency (1-2uS verses 50-100uS).
But to really work in this format the non-volatile memory needs to have a nearly unlimited write capability. XPoint does not. XPoint only has around 33,000 write cycles of durability per cell (and that's being generous). It needs to be half a million at a minimum and at least 10 million to *really* be useful.
-Matt
This!
My first thought was exactly this. You can have a Samsung 960 EVO, that is three times faster in read and over five times faster in write speeds for only twice the money of that Intel module. And it has a capacity of 250 GB, not 32 GB. If Samsung would make a 960 EVO 128GB model, the entire Intel product line would be dead in the water. Oh, wait. They have, somewhat... the SM961 128GB, which is both faster and about as expensive as 32 Intel GBs.
Sorry Intel, and thanx for the deja-vu moment, for my second thought was: 'Oh, my god, this is Intel Turbo Memory / Robson Modules (tm) all over again!"
Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').
XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.
So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.
This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.
Here's the 960 EVO and some randread tests on it at QD1 and QD4.
nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2
nvme1: mapped 8 MSIX IRQs
nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Interrupt Coalesce: 100uS / 4 qentries
nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1
(/dev/nvme1s1b is a partition filled with uncompressible data)
xeon126# randread /dev/nvme1s1b 4096 100 1 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 1
device
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
11458/s avg= 87.28uS bw=46.92 MB/s lo=68.50uS, hi=154.20uS stddev=7.01uS
11469/s avg= 87.19uS bw=46.98 MB/s lo=69.97uS, hi=151.97uS stddev=6.95uS
11477/s avg= 87.13uS bw=47.01 MB/s lo=69.31uS, hi=158.03uS stddev=7.03uS
And here is QD4 (really QD1 x 4 threads on 4 HW queues):
xeon126# randread /dev/nvme1s1b 4096 100 4 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 4
device
44084/s avg= 90.74uS bw=180.57MB/s lo=65.17uS, hi=237.92uS stddev=16.94uS
44205/s avg= 90.49uS bw=181.05MB/s lo=65.38uS, hi=222.21uS stddev=16.56uS
44202/s avg= 90.49uS bw=181.04MB/s lo=65.19uS, hi=221.48uS stddev=16.72uS
44131/s avg= 90.64uS bw=180.75MB/s lo=64.44uS, hi=245.91uS stddev=16.81uS
44210/s avg= 90.48uS bw=181.08MB/s lo=63.73uS, hi=232.05uS stddev=16.74uS
So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:
xeon126# randread /dev/nvme1s1b 32768 100 4 /dev/nvme1s1b bufsize 32768 limit 16.000GB nprocs 4
device
19997/s avg=200.03uS bw=655.26MB/s lo=125.02uS, hi=503.26uS stddev=55.24uS
20090/s avg=199.10uS bw=658.23MB/s lo=124.62uS, hi=522.04uS stddev=54.83uS
20034/s avg=199.66uS bw=656.47MB/s lo=123.63uS, hi=495.74uS stddev=55.59uS
20008/s avg=199.92uS bw=655.62MB/s lo=123.50uS, hi=500.24uS stddev=55.92uS
20034/s avg=199.66uS bw=656.47MB/s lo=125.17uS, hi=488.30uS stddev=55.02uS
20000/s avg=200.00uS bw=655.35MB/s lo=123.19uS, hi=504.18uS stddev=55.98uS
And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:
xeon126# randread /dev/nvme1s1b 32768 100 64 /dev/nvme1s1
device