With Optane Memory, Intel Claims To Make Hard Drives Faster Than SSDs (pcworld.com)
SSDs are generally faster than hard drives. However, they are also usually more expensive. Intel wants to change that with its new Optane Memory lineup, which it claims is faster and better performing than SSDs while not requiring customers to break their banks. From a report on PCWorld: Announced Monday morning, these first consumer Optane-based devices will be available April 24 in two M.2 trims: A 16GB model for $44 and a 32GB Optane Memory device for $77. Both are rated for crazy-fast read speeds of 1.2GBps and writes of 280MBps. [...] When the price of a 128GB SATA SSD is roughly $50 to $60 today, you may rightly wonder why Optane Memory would be worth the bother. Intel says most consumers just don't want to give up the capacity for their photos and videos. PC configurations with a hard drive and an SSD, while standard for higher-end PC users, isn't popular for the newbies. Think of the times you've had friends or family fill up the boot drive with cat pictures, but the secondary drive is nearly empty. Intel Optane Memory would give that mainstream user the same or better performance as an SSD, with the capacity advantage of the 1TB or 2TB drive they're used to. Intel claims Optane Memory performance is as good or better than an SSD's, offering better latency by magnitudes and the ability to peak at much lower queue depths.
Optane is Intel's name for 3D Xpoint storage. Right now, it's more expensive than NAND storage, and is only available in smaller capacities. That is why they are using it as cache on conventional hard drives. When it becomes cheaper to produce, and in higher capacities, it's going to be great. It will be way faster than NAND, and you won't have to worry about wear-levelling because it doesn't suffer from insulator breakdown.
Right. They are trying to market it as something cool and new, which would be great except for the fact that it isn't cool OR new. A person can already use ANY storage device to accelerate any OTHER storage device. There are dozens of 'drive accelerators' on the market and have been for years. So if a person really wanted to, they could trivially use a small NAND flash based NVMe SSD to do the same thing, and get better results because they'll have a lot more flash. A person could even use a normal SATA SSD for the same purpose.
What Intel is not telling people is that NOBODY WILL NOTICE the lower latency of their XPoint product. At (I am assuming for this product) 10uS the Intel XPoint NVMe is roughly 1/6 the latency of a Samsung NVMe device. Nobody is going to notice the difference between 10uS and 60uS. Even most *server* workloads wouldn't care. But I guarantee that people WILL notice the fact that the Intel device is caching much less data than they could be caching for the same money with a NAND-based NVMe SSD or even just a SATA SSD.
In otherwords, Intel's product is worthless.
-Matt
Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').
XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.
So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.
This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.
Here's the 960 EVO and some randread tests on it at QD1 and QD4.
nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2
nvme1: mapped 8 MSIX IRQs
nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Interrupt Coalesce: 100uS / 4 qentries
nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1
(/dev/nvme1s1b is a partition filled with uncompressible data)
xeon126# randread /dev/nvme1s1b 4096 100 1 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 1
device
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
11458/s avg= 87.28uS bw=46.92 MB/s lo=68.50uS, hi=154.20uS stddev=7.01uS
11469/s avg= 87.19uS bw=46.98 MB/s lo=69.97uS, hi=151.97uS stddev=6.95uS
11477/s avg= 87.13uS bw=47.01 MB/s lo=69.31uS, hi=158.03uS stddev=7.03uS
And here is QD4 (really QD1 x 4 threads on 4 HW queues):
xeon126# randread /dev/nvme1s1b 4096 100 4 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 4
device
44084/s avg= 90.74uS bw=180.57MB/s lo=65.17uS, hi=237.92uS stddev=16.94uS
44205/s avg= 90.49uS bw=181.05MB/s lo=65.38uS, hi=222.21uS stddev=16.56uS
44202/s avg= 90.49uS bw=181.04MB/s lo=65.19uS, hi=221.48uS stddev=16.72uS
44131/s avg= 90.64uS bw=180.75MB/s lo=64.44uS, hi=245.91uS stddev=16.81uS
44210/s avg= 90.48uS bw=181.08MB/s lo=63.73uS, hi=232.05uS stddev=16.74uS
So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:
xeon126# randread /dev/nvme1s1b 32768 100 4 /dev/nvme1s1b bufsize 32768 limit 16.000GB nprocs 4
device
19997/s avg=200.03uS bw=655.26MB/s lo=125.02uS, hi=503.26uS stddev=55.24uS
20090/s avg=199.10uS bw=658.23MB/s lo=124.62uS, hi=522.04uS stddev=54.83uS
20034/s avg=199.66uS bw=656.47MB/s lo=123.63uS, hi=495.74uS stddev=55.59uS
20008/s avg=199.92uS bw=655.62MB/s lo=123.50uS, hi=500.24uS stddev=55.92uS
20034/s avg=199.66uS bw=656.47MB/s lo=125.17uS, hi=488.30uS stddev=55.02uS
20000/s avg=200.00uS bw=655.35MB/s lo=123.19uS, hi=504.18uS stddev=55.98uS
And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:
xeon126# randread /dev/nvme1s1b 32768 100 64 /dev/nvme1s1
device