Are RAID Controllers the Next Data Center Bottleneck?
storagedude writes "This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems, all but guaranteeing another I/O bottleneck in data centers and another round of fixes and upgrades. What's more, some unnamed RAID vendors don't seem to even want to hear about the problem. Quoting: 'Common wisdom has held until now that I/O is random. This may have been true for many applications and file system allocation methodologies in the recent past, but with new file system allocation methods, pNFS and most importantly SSDs, the world as we know it is changing fast. RAID storage vendors who say that IOPS are all that matters for their controllers will be wrong within the next 18 months, if they aren't already.'"
Hardware RAID's are not exactly hopping off the shelf and I think many shops are happy with fiberchannel.
Let's do another reality check: this is enterprise class hardware. Are you telling me you can get SSD RAID/SAN in a COTS package that is cost approximate to whatever is available now? Didn't think so....
Let's face it, in this class of hardware things move much more slowly.
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
FTA Since a disk sector is 512 bytes, requests would translate to 26.9 MB/sec if 55,000 IOPS were done with this size. On the other end of testing for small block random is 8192 byte I/O requests, which are likely the largest request sizes that are considered small block I/O, which translates into 429.7 MB/sec with 55,000 requests
I'm not going to believe an article that assumes that because you can do 55K IOPS for 512Byte reads, you can do the same number of IOPS for 8K reads which are 16X larger and then just extrapolate from there. Especially since most SSD's (at least SATA ones) right now top out around 200MB/s and the SATA interface tops out at 300MB/s. Besides there are already real world articles out there where guys with simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID.
Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.
We've taken whole databases and loaded them into dedicated cache drives on the array, and surprise, no performance increase. DBAs and application writers have gotten so used to yelling, "Add Hardware! That they forgot how to optimize their applications and sql queries."
If storage was the bottleneck, I wouldn't be loading up storage ports (FAs) with 10-15 servers. I find it funny that the only devices on my 10,000 port SAN that can sufficiently drive IO are media servers and the tape drives (LTO-4) that they push.
If storage was the bottleneck there would be no oversubscription in the SAN or disk array. Let me know when you demand a single storage port per HBA, and I'm sure my EMC will take us all out to lunch.
I have more data than you. :)
Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.
You do sound like you know what you're doing, but there is quite a difference between average utilization and peak utilization. I have some servers that average less than 5% usage on a daily basis, but will briefly max out the connection about 5-6 times per day. For some applications, more peak speed does matter.
All the important operations tend to be random. For a file server, you may have twenty people accessing files simultaneously. Or a hundred, or a thousand. For a webserver, it'll be hitting dozens or hundreds of static pages and, if you have database backend, that's almost entirely random as well.
For people consolidating physical servers to virtual servers, you now have two, three, ten or twenty VMs running on one machine. If every one of those VMs tries to do a "sequential" IO, it gets interlaced by the hypervisor into all the other sequential IOs. No hypervisor would dare tell all the other VMs to sit back and wait so that every IO is sequential. That delay could be seconds or minutes or hours.
Now imagine all that, and take into account that the latest Intel SSD gets around 6600 IOPS read and write. A good, fast hard drive gets 200. So you could put thirty three hard drives in RAID 0 and have the same number of IOPS, and your latency would still be worse. All the RAID0 really does for you is give you a nice big queue pipeline, like in a CPU. Your IO doesn't really get done faster, but you can have many more running simultaneously.
Given that SSDs are easily three to four times faster on sequential IO and an order of magnitude faster on random IO, I don't think it's that implausible to believe that the industry isn't ready.
Sort of true, but not entirely accurate.
Is the on-demand response slow? Stats lie. Stats mislead. Stats are only stats. The systems I'm monitoring would use more I/O if they could. Those basic read/write graphs are just the start. How's the latency? Any errors? Pathing setup good? Are the systems queuing i/o requests while waiting for i/o service response?
And traffic is almost always bursty unless the link is maxed - you're checking out a nice graph of the maximums too, I hope? That average looks mighty deceiving when long periods are compressed. At an extreme over months or years, data points can be days. Overnight + workday could = 50%. No big deal on the average.
I have a similiar usage situation on many systems, but the limits are generally still storage dependent issues like i/o latency (apps make a limited number of requests before requests start queuing), poorly grown storage (a few luns there, a few here, everything is suddenly slowing down due to striping in one over-subscribed drawer), and sometimes unexpected network latency on the SAN (switch bottlenecks on the path to the storage).
Those graphs of i/o may look pitiful, but perhaps that's only because the poor servers can't get the data any faster.
Older enterprise SAN units (even just 4 or 5 years ago) kinda suck performance wise. The specs are lies in the real world. A newer unit, newer drives, newer connects and just like a server, you'll be shocked. What'cha know, those 4Gb cards are good for 4Gb after all!
Every year, there's a few changes and growth, just like in every other tech sector.
-- Life is good. Tastes like chicken.