Triple M.2 NVMe RAID-0 Testing Proves Latency Reductions
Vigile writes: The gang over at PC Perspective just posted a story that looks at a set of three M.2 form factor Samsung 950 Pro NVMe PCIe SSDs in a RAID-0 array, courtesy of a new motherboard from Gigabyte that included three M.2 slots. The pure bandwidth available in this configuration is amazing, breaching 3.3 GB/s on reads and 3.0 GB/s on writes. But what is more interesting is a new testing methodology that allows for individual storage IO latency capturing, giving us a look at performance of SSDs in all configurations. What PC Perspective proved here is that users often claiming that RAIDs "feel faster" despite a lack of bandwidth result to prove it, are likely correct. Measurements now show that the latency of IO operations improves dramatically as you add drives to an array, giving a feeling of "snappiness" to a system beyond even what a single SSD can offer. PC Perspective's new testing demonstrates the triple RAID-0 array having just 1/6th of the latency of a single drive.
On VM's in a home lab I see no difference between raid 0 and not with SATA on Samsung pros outside of benchmarks.
However, my VM's do boot quicker with the Samsung than the sansdisk that replaced them. The IOPS were better and that made them boot quicker and shutdown. I imagine at work it is the same with database or production VM's too
http://saveie6.com/
Gaming.
"If you have nothing to hide, you have nothing to fear." - Every fascist, ever
Remember kids, losing just one drive dumps the entire array. It's really not appropriate for anything besides completely transient data (scratch disks, stuff like this benchmark, etc.). Not smart to run your OS on RAID 0. RAID 10, OTOH . . . now we're talking.
But does raid 0 support sharing? It is the secret ingredient in the async sauce?
http://saveie6.com/
They did post a quick speed graph regarding raid-5 on the drives; obviously writes took an impact but reads were almost exactly the same as raid-0 x3 drives.
Karnal
So, basically it partly takes seek time out of the equation, or something similar?
Because then in theory I guess you can be serving multiple requests instead of just one at a time.
Doesn't seem entirely unreasonable. If the latency is spread out a little, it may not seem as big for any individual transaction.
Lost at C:>. Found at C.
I've been building PCs long enough to remember a time when things were improving so quickly that it made no sense to keep a computer for more than 4 years. But since then, the progress in CPU performance has reached a plateau. People like me, who bought a good Sandy Bridge system in 2011, still have a system that doesn't come close to feeling crippled and lazy. We don't have much reason to envy the people who bought the latest generation of i5/i7 systems. Five years used to mean an order of magnitude improvement in performance. Now it's not even a doubling. I've sometimes wondered when I will finally start feeling the urge to upgrade my system.
These SSD latency numbers are the first thing I've seen that gave me the feeling that there is some truly worthwhile trick that my present computer can't come close to matching. I'm not saying that I now want to upgrade, but on reading this, I have become upgrade-curious for the first time in many years.
These results seem to be very questionable. Their graphs claim that in some configurations almost all 4k read requests are handled within 100 ns. But getting even a single DRAM burst from a random DRAM location already takes almost 100ns, even through the memory controller is connected with a much tighter interface, optimized for low latency and PCIe is much slower than DRAM interface. Even without overhead 4 PCIe 3.0 lanes ( 8 GB/s) can only transfer 8 KB per s. Transfering a 4 KB Block should thus take at least 0.5 s or 500ns and that does not include any overhead nor the time needed to actually send the request to the SSD, open the page from the NAND flash, run ECC and decompression.
Jan
That Gigabyte board is looking more and more attractive.
And I'm actually due for a system rebuild this year...
Chas - The one, the only.
THANK GOD!!!
PC Perspective's new testing demonstrates the triple RAID-0 array having just 1/6th of the latency of a single drive.
That was with a queue depth of 16. Not exactly representative of a normal desktop user.
You're right. I/O is where the best improvements have been coming lately.
I'd still be using a Core2Quad if it wasn't for the platform's outdated I/O features.
PCI 3.0, SATA 3, USB3, DDR4, and surrounding technologies like M.2 and NVME are the the real reason to upgrade. The newer platforms are faster in most practical applications simply because they can feed the data to the CPU faster.
Intel's also been focusing on power saving. The new parts sip power in comparison to the decade old core2.
Hopefully we start seeing boards transitioning over to offering connectivity for ever-greater numbers of M.2 drives.
For RAID-0, the big issue is "lose a drive and you're fucked".
For RAID-5, the big issue is "lose a drive on a large-enough array and you could be looking at an unrecoverable read error during the array recovery".
Granted, most of the people who are using these setups are frothing gamers and hardware junkies who aren't keeping anything truly valuable within those filesystems.
But for those who are looking for truly dependable storage solutions, they should be looking at RAID-10 or better, or looking to offload their storage needs to a device that can handle something like RAID-6 or high-level ZFS.
Chas - The one, the only.
THANK GOD!!!
To bad the skylack only has 16+4 PCI-E 3.0 lanes from the CPU. That why intel needs to put QPI in all cpus and drive the chipset off of QPI and not DMI.
Yes, unfortunately the mu did not show up. Let's see if the html entity works: nope
Seems you have to use u for mu on slashdot.
Jan
The SSD controller already does a form of this, as it is talking to multiple flash memory dies over multiple channels. RAID is just another layer to get even more performance out of more parallelism (and as we figured out in testing, to considerably drop the latency under load).
Allyn Malventano
Storage Editor, PC Perspective
this sig was brought to you by the letter
All testing was performed with default (disabled cache). Further, cache settings have little effect on NVMe RAIDs on Z170. Additionally, our minimum latencies were 6us *longer* in an array vs. single SSD, so clearly no caching taking place.
this sig was brought to you by the letter
I have found these calculators work well for projecting the performance and capacity of various RAID levels:
http://wintelguy.com/raidperf.pl/?formid=1&raidtype=R0&ndg=2&ng=1
http://wintelguy.com/raidperf.pl/?formid=2&raidtype=R0&ndg=2&ng=1
Some other guy mentioned RAID 10 isn't a backup strategy; he's correct (no RAID level is), however one thing to keep in mind is that when his RAID 0 array dies, he'd better hope his back-ups are all up-to-date and restorable. With just about any other RAID level, you get an opportunity to replace the dead / dying drive first, start rebuilding, and KEEP ROLLING, with no need to screw around with backups at all and no human interaction even required if your array has a hot spare configured. Yes, technically that is availability, but I'd sure as hell take it over "fuck, there went one drive of my RAID 0 stripe, better hope I can tolerate this downtime and that my last backup set had everything I needed on it." RAID may not be a backup strategy but it's absolutely another layer in place before you need to restore from backups (as long as it's not RAID 0).
A RAID0 of 3 SSDs will be faster than a single SSD drive for multiple reasons, the primary one being that the kernel reads from all three devices at the same time, and (secondarily) both the SSDs and the kernel are doing read-ahead, so that once hundreds (if not thousands) of sectors are in memory, you're only looking at the time to copy them into the destination buffer. For more speed, set your read-ahead buffers up - /sys/devices/(device)/hostX/targetX:0:0/X:0:0:0/block/sde/queue/read_ahead_kb, in Linux...
Usually defaults to 128KB....
Not smart to run your OS on RAID 0.
Why? You're assuming the OS is something that I can't just re-install? Remember that you're only slightly more than doubling the failure rate. Given the incredibly low failure rates it's not like you're guaranteed to lose things constantly.
Let me get this straight. When you have more devices available to service read or write requests, the time that it takes to service the request goes down.
What next? Are we going to be told that RAID5 gives better read performance than a single drive too?
The gang over at PC Perspective...
gangs, presumably roving, are taking over websites now! YOUR SITE COULD BE NEXT!
Anons need not reply. Questions end with a question mark.
Most SSD do RAID internally across several dies already.
If programs would be read like poetry, most programmers would be Vogons.
There are several things that affect the latency. You will get about 200us latency on program on a die, but that can be reduced by using some caching and acknowledging the writes before the program finishes, but that cache can be saturated by sustained writes. Especially with random writes that get high level of map updates. As that fills the write latency on sustained random writes will eventually climb to the 2 program instructions latency, which is 400us. With multiple controllers, only every third write is going to a specific controller. The latency will go down as you don't sustain writes to each controllers at the level to flood the caching.
If programs would be read like poetry, most programmers would be Vogons.
For RAID-5, the big issue is "lose a drive on a large-enough array and you could be looking at an unrecoverable read error during the array recovery".
This gets repeated a lot, but isn't a problem for any halfway decent RAID setup because they slowly read data from the drives in the background (called patrol read on LSI/Dell controllers). The chances of a problem with a drive not turning up in one of the numerous patrol reads yet happening during a recovery are astronomically small.
Not completely true. With 6 or 8tb drives, you are looking at a few days to a week or so of the raid rebuilding. During this time, you have the protection of raid 0 without the speed.
In Soviet Russia the insensitive clod is YOU!
Any classic RAID level is useless if you want data safety. So one of your drives in RAID10/5/6 returns garbled data (without an error), which copy/parity do you trust?
Also many 5/6 implementations won't actually calculate the parity chunk on reads, only for rebuilds. There are some pricey controllers that do full checksumming ala ZFS on chip but as with most hardware systems the SPOF becomes your controller.
With the drives becoming ever larger and faster, more data is being read but the errors per terabyte read are not really decreasing so the probability of you reading an error is nearing 1 faster than ever.
Custom electronics and digital signage for your business: www.evcircuits.com
RAID0 is unsuitable for situations that require very high uptime but there is nothing inherently dangerous about storing real data on a RAID0 array. I know this gets said frequently but, RAID, at any level, isn't a backup. It's a reliability/performance feature. Even if you had configured these three disks as a triple mirrored RAID1, you would be insane to not run SMART monitoring tools on the disks and even more insane to not have good backups. I don't know if I've ever had a disk fail without plenty of warning from SMART monitoring so, for RAID0, you are mostly gaining some performance at the expense of more difficult disk replacements. That seems like a very acceptable tradeoff for something like a gaming machine.
When was the last time you actually had to rebuild an array with large (4GB+) constituent disks?
And they don't read THAT slowly. Indeed, the increased (and sustained) load during the rebuild can cause additional drives in the array to fail.
Chas - The one, the only.
THANK GOD!!!
Is the rebuild issue for SSD RAID-5 arrays the same stratum of risk it is for spinning rust?
I would presume not, both because of speed and because there's not nearly as much added stress from the intensive reads necessary to rebuild the array.
Double parity and/or hot spare is better, but I kind of wonder as SSDs gain write durability (or it becomes more accepted they just have it, as some endurance tests have noted) and they start popping up in more budget minded arrays if maybe RAID-5 might make a comeback due to its lower overhead and arguably less risk due to faster and less mechanically strained rebuilds.
Ummm...ok. So when your SMART detects a failing drive in your RAID0 array and you decide you want to replace it, how do you do that exactly? Oh, that's right, wipe the entire array and restore from backup, which, depending on the size of your array can take anywhere from several hours to days, more if you decided to use your array to run the OS as well. RAID0 is just a plain terrible idea, period. It doesn't matter if you don't think you need uptime, an N disk RAID0 is N times more likely to fail catastrophically as a standalone hard disk (assuming the failure rates on all of the hard disks are equal), and without redundancy getting back up and running is a long process.
It depends on how you've setup your RAID0 array. If you are using mdraid, you can simply take the array offline, dd the contents of a failing disk onto a new disk, remove the old disk and bring the array back online (you can do this with a USB boot stick if you need to take the root filesystem offline). That's certainly more work than popping a failing disk out of a hotswap bay, screwing a new disk into the drive tray and pushing it back in. But, it's not that prohibitive.
Now, having said that, I certainly wouldn't build a RAID0 array out of a bunch of "green" desktop disks or out of bulk storage disks. I have a RAID0 array with 4 Intel 80GB SSDs and SMART says they have an online time of 4.3 years. They have incredible performance and never a hiccup. If I lost one of the disk controllers, the data could be restored in a few hours with a single rsync.
My point is that RAID0 has a place. It may not have a place in your setup but, it's not inherently flawed technology. It's a technology that is aimed at maximum performance with a bit more risk in downtime when compared to a single disk.
The other consideration is heat, to read/write data that fast generates a lot of heat (as you can see in this page of TFA). Fitting the same heat load in a small enclosure would probably require some cooling (although this is a problem the computing industry has had to solve with practically every other component).
M.2 supports up to 2^16 queues with a queue depth of 2^16 each. I wonder when SSD controllers will start to take advantage of this new protocol. These first gen M.2 drives seem more like retro-fitted SATA drives that just so happen to have beefy controllers, but were not specifically designed to make full use of the new features.
Correct. SSDs fail for different reasons than spinning rust. Most mechanical HDs fail for physical reasons and physical reasons tend to be highly correlated for all drives in an array, even if they're different models or even brands. There is a very high risk that if one drive fails, another is right behind it. RAID5, I'm looking at you.
RAID0 is any drive failing is a loss, so multi-drive failures don't matter so much, but they're also much less likely until it's a firmware bug or other pathological issue. But SSDs are pretty much RAID0 already and have a fraction the failure rate of mechanical drives.
Astronomically small? It's happened to me TWICE in a couple years and I only have a single large raid array. It happens quite often -- and I'm using one of your LSI controllers (9280-16i4e).
Rebuild just finished... Yesterday. Took 5 days, and that is with 3TB disks.
What do I do? Well, I run a RAID-0 of SSD for my OS. I drop both the failing drive and a new drive into my drive duplicator, hit a button, and approximately 5 minutes later I put the new drive into the box and it's running again. That is of course if SMART detects it before failure which is ~50% of the time. Otherwise I wipe and restore from backups.... I'm guessing about 2-3 hours as I haven't had to do it yet.
For RAID-5, the big issue is "lose a drive on a large-enough array and you could be looking at an unrecoverable read error during the array recovery".
This gets repeated a lot, but isn't a problem for any halfway decent RAID setup because they slowly read data from the drives in the background (called patrol read on LSI/Dell controllers). The chances of a problem with a drive not turning up in one of the numerous patrol reads yet happening during a recovery are astronomically small.
I'm not sure how you define "astronomically", but I've seen this more than a few times in my career. And it has become increasingly common with larger disks and larger arrays.
RAID 5 is decent for availability... but you'd better be able to restore from your backups. RAID 6 should be the default these days (though I prefer ZFS RAIDZ2 or RAIDZ3). And don't be one of those idiots who makes a 32-disk, 192 TB RAID5 (or 6 for that matter).
SWM seeks new sig for a brief fling
5 days is a long time to hold one's breath...
I can totally agree with these sentiments. My first computer was a Commodore 64. Since then, I've been chasing the performance dragon, upgrading to a new computer every couple years. C128, Amiga, XT, 286, 386, 486, Athlon, P60, P2, P3, P4, P5... up until my last two computers.
My previous PC was a Core2Duo e6600 (circa 2006) and its performance was great and then still good. Had it for 5 years. I assembled my current desktop, a Core i7 2600k, in mid 2011. I've upgraded the video card (I'm a gamer) a couple times, and I was starting to feel a little performance pinch about 12-18 months ago, so I upgraded my OS drive to a 512GB SSD. I'm back to having zero performance problems except when fast traveling in Fallout 4 (my games drive is a mechanical hard drive.)
I've been considering another upgrade... but only if I can get M.2/NVMe and DDR4. I'm hoping that all of the announcements we heard months ago about new faster, affordable storage will pan out. Based on the hype, I'm hoping to get a few blazing TB of disk space for what mechanicals cost right now... but I won't hold my breath.
And is why it's a RAID-6 and not a RAID-5. I've had a second disk fail during rebuilds twice so far.