Slashdot Mirror


Are RAID Controllers the Next Data Center Bottleneck?

storagedude writes "This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems, all but guaranteeing another I/O bottleneck in data centers and another round of fixes and upgrades. What's more, some unnamed RAID vendors don't seem to even want to hear about the problem. Quoting: 'Common wisdom has held until now that I/O is random. This may have been true for many applications and file system allocation methodologies in the recent past, but with new file system allocation methods, pNFS and most importantly SSDs, the world as we know it is changing fast. RAID storage vendors who say that IOPS are all that matters for their controllers will be wrong within the next 18 months, if they aren't already.'"

171 comments

  1. I/O is random? What have you been smoking? by Anonymous Coward · · Score: 1, Interesting

    It is very common when doing disk benchmarks so have separate tests for small random reads/writes, and large sequential reads/writes. The numbers are often different.

    And while you can't always predict what disk sector is going to be read next, often you can, which is why predictive raid controllers with lots of memory are very useful.

    I think we need a mod option to mod down the article summary: -1, stupid editor.

    1. Re:I/O is random? What have you been smoking? by countertrolling · · Score: 3, Insightful

      I think we need a mod option to mod down the article summary: -1, stupid editor.

      You had your chance.

      --
      For justice, we must go to Don Corleone
    2. Re:I/O is random? What have you been smoking? by Anpheus · · Score: 5, Insightful

      All the important operations tend to be random. For a file server, you may have twenty people accessing files simultaneously. Or a hundred, or a thousand. For a webserver, it'll be hitting dozens or hundreds of static pages and, if you have database backend, that's almost entirely random as well.

      For people consolidating physical servers to virtual servers, you now have two, three, ten or twenty VMs running on one machine. If every one of those VMs tries to do a "sequential" IO, it gets interlaced by the hypervisor into all the other sequential IOs. No hypervisor would dare tell all the other VMs to sit back and wait so that every IO is sequential. That delay could be seconds or minutes or hours.

      Now imagine all that, and take into account that the latest Intel SSD gets around 6600 IOPS read and write. A good, fast hard drive gets 200. So you could put thirty three hard drives in RAID 0 and have the same number of IOPS, and your latency would still be worse. All the RAID0 really does for you is give you a nice big queue pipeline, like in a CPU. Your IO doesn't really get done faster, but you can have many more running simultaneously.

      Given that SSDs are easily three to four times faster on sequential IO and an order of magnitude faster on random IO, I don't think it's that implausible to believe that the industry isn't ready.

    3. Re:I/O is random? What have you been smoking? by symbolset · · Score: 1

      Agree. For VM image files you may want to consider something else. The new PCIe attach ssd cards come in sizes to 1TB and have IOPS over 250,000. Streaming is likewise fast, and latency is very low. Which is nice.

      --
      Help stamp out iliturcy.
    4. Re:I/O is random? What have you been smoking? by Sillygates · · Score: 1

      So you could put thirty three hard drives in RAID 0 and have the same number of IOPS, and your latency would still be worse.

      Actually, thats incorrects, Here's why:

      When you calculate IOPS, a good portion small of reads and writes get executed at random places on the disks. When you you make one filesystem write on a raid0 set (depending on how smart the raid0 controller is), it will be locking up several or ALL the disk spindles for that individual read/write.

      The IOPS are negligibly better on a 33 disk raid0 set, and depending on your disk controller, it might be worse (every write equates to 33 dma requests).

      It is faster for reading large files though, but that is NOT what a fair IOPS test measures.

      For read operations, you can double your read IOPS by using a mirror. This is because ony semi decent controller will split all the read requests between the drives in the mirror. When your issuing lots of read requests from several threads, the load should be approximately equal across the drives.

      --
      I fear the Y2038 bug
    5. Re:I/O is random? What have you been smoking? by Dahamma · · Score: 1

      Good points, though of course some problems are more a matter of server design/allocation than any gross inadequacy on the part of the RAID controller. You can always try faster hardware to solve a performance problem, but a lot of time it's just due to bad software/configuration.

      For example, no one in their right mind would share physical disks between 10-20 VMs in any application where disk performance is critical - a good server architect builds a system that works with the hardware available. Problem is, plenty of these applications/servers are not built by people in their right mind :)

    6. Re:I/O is random? What have you been smoking? by wagnerrp · · Score: 2

      When you calculate IOPS, a good portion small of reads and writes get executed at random places on the disks. When you you make one filesystem write on a raid0 set (depending on how smart the raid0 controller is), it will be locking up several or ALL the disk spindles for that individual read/write.

      Actually, that's incorrect. Here's why:

      When you make a RAID0 array, you stripe large blocks between all the disks, usually 64K-256K large. If your operation does not cross the block boundary, you only access a single drive. Assuming those random small files are evenly distributed, your IOPS scale almost linearly with drive count.

    7. Re:I/O is random? What have you been smoking? by BikeHelmet · · Score: 1

      I'd rather have an ioDrive.

      See: http://hothardware.com/Articles/Fusionio-vs-Intel-X25M-SSD-RAID-Grudge-Match/?page=9

      With ludicrously high IOPS, your CPU doesn't have to do much waiting, which pretty much defeats any RAID solution. RAID usually raises overhead, because your CPU has to decide which device the requests go to - unless you use expensive hardware RAID controllers, all of which have IOPS caps. Most RAID solutions also go through slower interfaces - although compared to PCIe 2.0 4x, every interface(SATA1/2/3, USB2/3, etc.) is slow.

      HDDs are impressive tech, but they have a different purpose. Density, longevity. SSDs are really going to shine for database stuff in the future. Prices are dropping rapidly, and are almost here!

    8. Re:I/O is random? What have you been smoking? by OnlyPostsWhilstDrunk · · Score: 1

      All the important operations tend to be random.

      I can only assume you wrote that backwords, lest my weasel trousers go to waste.

      --
      Sig: I don't spell check and this is legit. This was written while I was drunk, and quite possibly with m eyes closed, b
    9. Re:I/O is random? What have you been smoking? by Quantos · · Score: 1

      With todays technology SSD isn't ready for SSD. It's poorly controlled. How much more of a bottleneck do you need? Does anybody remember the machines with shitloads of RAM and not a HDD in sight?

      --
      Some people are only alive because it's against the law for me to hunt them down and kill them.
    10. Re:I/O is random? What have you been smoking? by Lennie · · Score: 1

      Hmm, maybe I don't even want to know the pricetag on that

      --
      New things are always on the horizon
    11. Re:I/O is random? What have you been smoking? by badkarmadayaccount · · Score: 1

      Another reason we need general purpose I/O co-processor. Maybe a simple blitter chip with MMIO would suffice.
      </AmigaFan type="16 y.o.">

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  2. distibution by ArsonSmith · · Score: 1

    with things like Haadop and cloudstore, pNFS, Lustre, and others storage will be distributed. There will no longer be the huge EMC, Netapp, Hitachi etc central storage devices. There's no reason to pay big bucks for a giant single point of failure when you can use the Linus method of upload to the internet and let it get mirrored around the world. (In a much more localized manor.)

    --
    Paying taxes to buy civilization is like paying a hooker to buy love.
    1. Re:distibution by bschorr · · Score: 3, Insightful

      That's fine for some things but I really don't want my confidential client work-product mirrored around the world. Despite all the cloud hype there is still a subset of data that I really do NOT want to let outside my corporate walls.

      --
      -B-
    2. Re:distibution by Ex-MislTech · · Score: 2, Informative

      This is correct, there are laws on the books in most countries that prohibit the exposure of medical and other data
      to risk by putting it out in the open. Some have even moved to private virtual circuits, and the SAN's with fast
      access via solid state storage of active files works fine, and it moves less accessed data to drive storage,
      but none the less quite fast and SAS technology is faster than SCSI tech in throughput.

      --
      google "32 trillion offshore needs IRS attention"
    3. Re:distibution by Ex-MislTech · · Score: 2, Informative

      An example of SAS throughput pushing out 6 Gbps.

      http://www.pmc-sierra.com/sas6g/performance.php

      --
      google "32 trillion offshore needs IRS attention"
    4. Re:distibution by rubycodez · · Score: 1

      eh, properly designed systems using the big disk arrays certainly don't have a single point of failure. And their data is replicated to other big disk arrays in other locations. That's why they cost "the big bucks". Your cloud is fine for relatively low-speed low-security read-mostly data, but not for high-volume financial and healthcare systems

    5. Re:distibution by mysidia · · Score: 1

      You could encrypt all the data using AES-512 or something stronger with various keys.

      So you mirror the data all around the world, while concentrating on securing the encryption keys, which are a lot smaller than the data, and easier to distribute to secure locations only.

    6. Re:distibution by lgw · · Score: 2, Informative

      SAS technology is faster than SCSI tech in throughput

      "SCSI" does not mean "parallel cable"!

      Sorry, pet peev, but obviously Serial Attached SCSI (SAS) is SCSI. All Fibre Channel storage speaks SCSI (the command set) all USB storage too. And iSCSI? Take a wild guess. Solid state hard drives that plug directly into PCIe slots with no other data bus? Still SCSI command set. Fast SATA drives? The high end ones often have a SATA-to-SCSI bridge chip in front of SCSI internals (and SAS can use SATA cabling anyhow these days).

      Pardon me, I'll just be over here grumbling about this.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    7. Re:distibution by SuperQ · · Score: 1

      Uhh, are you dense? Distributed storage doesn't mean you use someone else's servers. The software mentioned above is for internal use. Hadoop is used by yahoo for their internal cloud, and Lustre is used by a number of scientific labs that do military work.

    8. Re:distibution by lgw · · Score: 2

      For my own personal data, I'd consider that adequate. For data I'm legally required to keep secret - absolutely not. Your physical security design should force an attacker to steal both your keys and your data, each from a seperate physical location, so that you can destroy one as soon at the other is stolen to prevent data loss. Electronic security of course focuses on compartmentalization and auditing, so that an inside attacker can only steal a small portionof the data, and can be caught an jailed afterwards. That's all pretty basic design.

      Also, 256-bit symmetric encryption really is enough - it's firmly beyond the realm of what can be brute-forced, unless some fundamental understanding of physics is wrong. 256-bit AES is only vulnerable to weaknesses in the algorithim being discovered at some future point. If you're paranoid, you're far better off using 2 unrelated 256-bit symmetric algorithms than a symmetric key larger than 256 bits.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    9. Re:distibution by Anonymous Coward · · Score: 1, Funny

      "SCSI" does not mean "parallel cable"!

      Ok, yes, you're correct.

      But the common meaning of SCSI is "parallel SCSI", because for most of SCSI's existence parallel SCSI was the only option.

      Similarly, ATA does technically include both parallel ATA and serial ATA. But the common meaning of ATA is parallel ATA, because for most of ATA's existence parallel ATA was the only option.

      Pardon me, I'll just be over here grumbling about this.

      You kids get off my lawn!

    10. Re:distibution by Chris+Daniel · · Score: 1

      In a much more localized manor

      We're going to start putting data centers in big houses now?

      --
      Don't blame me -- I voted for Roslin.
    11. Re:distibution by sjames · · Score: 1

      That's what strong crypto is for.

    12. Re:distibution by mysidia · · Score: 1

      It's more convenient to have the data in multiple places and divide the keys.

      A 512 key basically provides you a security guarantee.

      You can divide your 512 bit keys in half, and place half of the bits for each key at different places. Either, you use 2 256 bit keys and just string the bits together, or you XOR the key with a 512-bit random number, and store only that random number in one place, and the XOR result in the other place.

      Then your security is actually greater than if you just secured the data and an encryption key separately; you have 3 isolated places where you have keys or data.

      If a compromise of any 1 becomes known to you, you destroy all 3.

      Also, you could use different keys for each different place you have stored a copy the data (that means you encrypt the data again every time you store it in a different place).

      That way, should you destroy one location's keys, you retain redundant access to your other copies of the data.

    13. Re:distibution by Anonymous Coward · · Score: 0

      256-bit AES is only vulnerable to weaknesses in the algorithm being discovered at some future point.

      You're also vulnerable to initially choosing a weak key with not a lot of randomness. Many encryption systems have used strong algorithms but fallen victim to this flaw.

      Debian recently had this kind of flaw in SSL certificate & key generation.

    14. Re:distibution by lgw · · Score: 1

      A 512 key basically provides you a security guarantee.

      Only salesmen talk about guarantees in security. Everything is vulnerable, it's just a question of effort.

      There are several multi-key solutions you can buy from reputable vendors for at-rest data encryption (3 of 5 keycards needed, or 2 of 5, or whatever). That's a good approach to protection against insiders. It wouldn't justify making the at-rest data publicly accessible, nor failing to compartmentalize access.

      And yeah, the "Store the key and the data in different buildings" approach is just one security consideration, you obviously want data mirroring, backups, a disaster recover plans, etc, etc. But you want the same degree of security for each copy of the data, not the formerly common "ultra-secure data backed up to unencrypted tapes and sent by UPS" approach.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    15. Re:distibution by sjames · · Score: 1

      SCSI started life as a command set AND a physical signaling specification. The physical has evolved several times, but until recently was easily recognizable as a natural evolution of the original parallel SCSI. At the cost of a performance degradation and additional limitations (such as nimber of devices), the generations of scsi have interoperated with simple adapters.

      SaS uses the same command set, but the physical is a radical departure (that is, it bears no resemblance) from the original SCSI and it's descendants. Arguably, if you're going to call SaS SCSI because of the command set, you'll have to also call USB SCSI. We call drives and controllers that speak the SCSI command set over fibre Fibre Channel. Drives that speak the SCSI command set over a high speed serial layer are called SaS. Drives that speak SCSI command set over USB are called USB. The devices can only interoperate with active translation. A simple connector adapter won't do it.

      We call controllers and drives that speak the ATA command set over a fast serial bus SATA. As it turns out, because the command sets are so similar and the physical specs are close, SaS controllers are bi-lingual and can also speak ATA command set to SATA drives.

      So, no, SaS drives are not obviously SCSI any more than a USB drive is obviously SCSI. SaS devices obviously speak the SCSI command set and are obviously targeted as the successor to SCSI.

    16. Re:distibution by lgw · · Score: 1

      I remember the days when people reading Slashdot wanted to use precise terminology about technology - don't you? Sure you do. But go on with your "Serial attached SCSI drives are not SCSI" and your "I double-clicked on the the internet, but it's broken" and so on. Those of us who are still nerds will pendanticly point out that all these storage technologies are "really SCSI drives, if you look closely" and we'll be right. Grumble grumble grumble.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    17. Re:distibution by Score+Whore · · Score: 1

      So, no, SaS drives are not obviously SCSI any more than a USB drive is obviously SCSI.

      In my world the T10 Technical Committee defines SCSI and to them SAS is a SCSI protocol. QED.

      BTW, wtf is SaS?

    18. Re:distibution by jon3k · · Score: 1

      So you think your network is more secure than storing data on Google's servers. Interesting.

    19. Re:distibution by mysidia · · Score: 1

      256-bit AES is only vulnerable to weaknesses in the algorithm being discovered at some future point.

      If you are truly concerned about security, you want to be prepared for the possibility of a weakness in the algorithm, and that means using a key that is amply strong enough to withstand the anticipated discovery of weaknesses in the algorithm that reduce its strength, without even a remote hardly conceivable risk to the security of the data.

    20. Re:distibution by Anonymous Coward · · Score: 0

      Uh, Linus didn't invent any of those technologies.

      I also liked the way you lumped in Haadop and Cloudstore in with pNFS, as if either of those crappy filesystems could seriously compete with pNFS.

    21. Re:distibution by badkarmadayaccount · · Score: 1

      IOW, SCSI commands in ATA packets (via ATAPI spec) in Ethernet frames is better choice. I never got what's the hype around byzantine SAN back-bones. There are specs for wrapping the stuff in Ethernet, then just let 'er rip in some phat 10 Gbps pipes. Anyone with me?

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    22. Re:distibution by bschorr · · Score: 1

      Yes. It is.

      For one thing I know personally each and every person with physical access to our servers. I handed them their keys to the room.

      I have not the slightest idea who can access the servers at Google that are storing my data.

      Secondly when Google has an outage (as they sometimes do) my data is still perfectly accessible to me on my servers.

      I don't have to worry about somebody at Google misconfiguring something and inadvertently exposing my data to people it wasn't intended for (as they did not too long ago).

      The "cloud" is fine for storing your kid's soccer schedule and grandma's brownie recipe. If you think I'm going to advise my clients to store confidential client work product on some anonymous server in god-knows-what country you can forget it.

      --
      -B-
  3. Wait. You mean my SAN is Dead? by mpapet · · Score: 4, Insightful

    Hardware RAID's are not exactly hopping off the shelf and I think many shops are happy with fiberchannel.

    Let's do another reality check: this is enterprise class hardware. Are you telling me you can get SSD RAID/SAN in a COTS package that is cost approximate to whatever is available now? Didn't think so....

    Let's face it, in this class of hardware things move much more slowly.

    --
    http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
  4. That's ok by Anonymous Coward · · Score: 0

    This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems

    Right. The point of a parallel file system is that you do not need RAID. Slashdot's editors must think really low of their readers.

    1. Re:That's ok by Jafafa+Hots · · Score: 1

      The READERS think low of the readers, why should the editors be any different?

      --
      This space available.
    2. Re:That's ok by Troy+Baer · · Score: 1

      The point of a parallel file system is that you do not need RAID.

      Really? Why has virtually every production parallel file system implementation I've ever seen (using GPFS, Lustre, and PVFS) been done on top of hardware RAID controllers?

      --
      "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  5. BAD MATH by adisakp · · Score: 5, Interesting

    FTA Since a disk sector is 512 bytes, requests would translate to 26.9 MB/sec if 55,000 IOPS were done with this size. On the other end of testing for small block random is 8192 byte I/O requests, which are likely the largest request sizes that are considered small block I/O, which translates into 429.7 MB/sec with 55,000 requests

    I'm not going to believe an article that assumes that because you can do 55K IOPS for 512Byte reads, you can do the same number of IOPS for 8K reads which are 16X larger and then just extrapolate from there. Especially since most SSD's (at least SATA ones) right now top out around 200MB/s and the SATA interface tops out at 300MB/s. Besides there are already real world articles out there where guys with simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID.

    1. Re:BAD MATH by fuzzyfuzzyfungus · · Score: 4, Insightful

      "simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID."

      The last part of that sentence is particularly interesting in the context of this article. "Motherboard RAID" is, outside of the very highest end motherboards, usually just bog-standard software raid with just enough BIOS goo to make it bootable. Hardware RAID, by contrast, actually has its own little processor and does the work itself. Of late, general purpose microprocessors have been getting faster, and cores in common systems have been getting more numerous, at a substantially greater rate than hardware RAID cards have been getting spec bumps(outside of the super high end stuff, I'm not talking about whatever EMC is connecting 256 fibre channel drives to, I'm talking about anything you could get for less than $1,500 and shove in a PCIe slot). Perhaps more importantly, the sophistication of OS support for nontrivial multi-disk configurations(software RAID, ZFS, storage pools, etc.) has been getting steadily greater and more mature, with a good deal of competition between OSes and vendors. RAID cards, by contrast, leave you stuck with whatever firmware updates the vendor deigns to give you.

      I'd be inclined to suspect that, for a great many applications, dedicated hardware RAID will die(the performance and uptime of a $1,000 server with a $500 RAID card will be worse than a $1,500 server with software RAID, for instance) or be replaced by software RAID with coprocessor support(in the same way that encryption is generally handled by the OS, in software; but can be supplemented with crypto accelerator cards if desired).

      Dedicated RAID of various flavors probably will hang on in high end applications(just as high end switches and rouers typically still have loads of custom ASICs and secret sauce, while low end ones are typically just embedded *nix boxes on commodity architectures); but the low end seems increasingly hostile.

    2. Re:BAD MATH by SuperQ · · Score: 1

      It doesn't matter that SATA can do 300MB/s. That's just the interface line rate. Last I did benchmarks of 1T drives (seagate ES.2) they topped out at around 100MB/s. Drives still have a long way to go before they saturate the SATA bus. The only way that happens is if you are using port multipliers to reduce the number of host channels.

    3. Re:BAD MATH by jon3k · · Score: 1

      Vertex (with Indilinx controllers) and Intel (even the "cheap" MLC drives from both vendors that are less than $3.00/GB) are seeing 250MB/s-270MB/s actual real world results for reads. The actual throughput of SATA 3G is actually slightly less than 300MB/s so essentially we're at the limitation of SATA 3G, or very very close -- too close for comfort.

    4. Re:BAD MATH by jon3k · · Score: 2, Interesting

      You forgot about SSDs, consumer versions of which are already doing over 250MB/s reads for less than $3.00/GB. And we're still essentially talking about second generation products (Vertex switched from JMICRON to Indilinx controllers and Intel basically just shrunk down to 34nm for their new ones, although their old version did 250MB/s as well).

      I'm using a 30GB OCZ Vertex for my main drive on my windows machine and it benchmarks around 230MB/s _AVERAGE_ read speed. It cost $130 ($4.30/GB) when I bought it a couple months ago, and prices are falling. The new Intel X25-M is $225 for 80GB ($2.81/GB).

    5. Re:BAD MATH by Rockoon · · Score: 1

      Remember that Intel entered this market as a tiger out for blood with their *first* SSD throwing data at just under the SATA300 cap. This isnt a coincedence.

      When SATA600 goes live, expect Intel and OCZ to jump right up to the 520MB/sec area as if it was trivial to do so... (because it is!)
      ioFusion has a PCIe flash solution that goes several times faster than these SATA300 SSD's. The problem is SATA. The problem is SATA. The problem is SATA.

      --
      "His name was James Damore."
    6. Re:BAD MATH by drsmithy · · Score: 1

      Besides there are already real world articles out there where guys with simple RAID0 SSD's are getting 500-600 MB with 3-4 drives using Motherboard RAID much less dedicated harware RAID.

      It is unlikely "dedicated hardware RAID" would be meaningfully faster.

    7. Re:BAD MATH by mysidia · · Score: 1

      for instance) or be replaced by software RAID with coprocessor support(in the same way that encryption is generally handled by the OS, in software; but can be supplemented with crypto accelerator cards if desired).

      I don't think so. The bus tax imposed by piping the bits to a dedicated processor for checksum offloading is as great or greater than the actual processing load itself, the load can adversly effect system performance.

      If hardware RAID dies, then I think RAID5 dies also, in favor of RAID1 or newer non-traditional-RAID redundancy algorithms more suitable for implementation on software; algorithms such as ZFS mirroring, which don't suffer problems such as the "RAID5 write hole", or performance penalties, since there's no such things as 'partial stripe writes'.

    8. Re:BAD MATH by sirsnork · · Score: 1

      The current problems with "motherboard RAID" are:

      1. They can't take a BBU, so you either leave write caching turned on on the drives and lose data on an unexpected shutdown (possibly corrupting your array)

      OR

      Turn write caching off on the drives and have incredibly poor write speeds.

      2. The software (and probably the hardware) are no where near smart enough. They might tell you a drive is failing, they might not. If they do they might rebuild the array successfully or may just corrupt it (and if it's your boot drive then there goes your OS too). They are just far to unreliable when a failure does occur.

      3. As has been pointed out by Alan Cox on LKML a lot of the drivers and hardware don't do checksums when they should so you could also get silent data corruption.

      Overall I agree, hardware RAID controllers are vrey very slow to get spec upgrades and this is going to be a problem going forward unless that changes, but "motherboard RAID" is by no means the solution. This is actually a selling point for real external storage (EMC etc) because it is FAST and RELIABLE.

      --

      Normal people worry me!
    9. Re:BAD MATH by afidel · · Score: 1

      You miss the real world cost of using software raid, licensing. Most applications that justify $500 RAID controllers are licensed on a per cpu or per core model which means using the cpu to accelerate I/O is really freaking expensive. Oracle enterprise lists for $60k/2 cores on linux and windows which means if you max out a core doing I/O you have spent $30k to do I/O, you better be getting some phenomenal rates for that kind of money. Plus hardware raid supplies something that no software raid implementation can, battery backed write cache. I agree that hardware raid coprocessors need to get significantly better to keep up with the demands that SSD's will put on them, but that is quite separate from eliminating hardware raid altogether, for $500 the should be able to put out a board with a few hundred megs of memory, a fast processor, and a battery.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    10. Re:BAD MATH by afidel · · Score: 1

      Consumer drives can peak at those kind of rates but they would fall over in a month or six if they were asked to sustain writes at anywhere near those rates. The only inexpensive SSD I would (and have) trusted my data to is the Intel x-25e and it comes in at $15/GB which doesn't compare very favorably for most loads with the still expensive $2/GB for FC HDD's. Things may change in a few generations when SLC rates come down to the $3/GB range, but high end HDD's will probably be at 25c per GB by then.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    11. Re:BAD MATH by jon3k · · Score: 1

      Please show me a SAN vendor that sells their certified fiber channel drives for $2/GB. Try more like $6-$10/GB. Let's also not forget that a single SSD can put out over 10 times as many random 4K IOPS than even the fastest fiber channel drives. So, if it comes down to $/IOPS (and don't forget space/power/cooling), SSDs are already blowing fiber channel out of the water. It depends on your application requirements, but I can see a lot of instances in which a couple of shelves of SSDs could come in really handy for some of a SANs workload.

    12. Re:BAD MATH by afidel · · Score: 1

      I have a quote that we're about to execute that includes a bunch of 450GB 15k FC drives for ~$1,000 which is within spitting distance of $2/GB. Yes, on a $/IOP basis SLC does in fact beat FC drives even today, but the percentage of my storage that needs IOPS over everything is very low so I have built a balanced array which gives me the best combination of features/price/storage/IOPS. Other people will have other requirements, and there are some that needs IOPS no matter the cost, but I think those are still a small enough minority that they are not what will be driving the industry at large. For me the frontend cache is generally enough to keep my database servers happy with their log volumes writing at sub ms times, we just pin our most frequently used tables in memory which negates any need for super fast read cache =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    13. Re:BAD MATH by jon3k · · Score: 1

      The last quote I got from EMC for 400GB 15K FC drives was $4,000 PER DRIVE. Let me repeat that, four thousand dollars per drive. EMC wanted $1,000 for a 146GB 15K FC drive. This is their "certified" drives that they've "tested and validated"

      /me makes jerking-off motion

      Here's an HP 146GB 15K FC drive for over $1,000.

      Are you sure you don't have your prices mixed up?

      I'm sorry I dont maybe I wasn't clear enough. I totally agree it isn't for every workload, I think it works in a lot of instances, particularly if you're tiering your storage and have very very high read:write ratios. Another example that I'm seeing a lot of is database indexes on SSDs. I think as SSD prices come down and performance goes up, you'll continue to see them make inroads.

    14. Re:BAD MATH by afidel · · Score: 1

      Nope, my prices are not mixed up, we are buying an EVA with a significant number of shelves and are hence getting a decent discount, that brings the price per 450GB drive down to ~$1k. When you are buying something in that price range paying anything near list is just not doing your homework. We had competitive bids from all of the big boys except EMC (don't like to micromanage storage and their 5 year costs are so obscene as to be non-starters) and we let all the vendors know it was a competitive bid. HP came out with the right mix of features/price/performance and support.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    15. Re:BAD MATH by fuzzyfuzzyfungus · · Score: 1

      Out of curiosity(I'm not familiar with licencing for those sorts of applications), are the licencing costs per core in the machine or per core on which the application is running?

      If it is per core in the machine it would, as you say, be absurdly expensive to use general purpose cores for I/O or any other sorts of housekeeping. If it is per core running the licenced application, you'd just confine the application to X cores, as needed by your setup, and use the remaining cores in the machine for I/O and other stuff, paying only the (comparatively small) cost of additional hardware.

    16. Re:BAD MATH by afidel · · Score: 1

      It's per core/CPU in the machine unless you have hard zoning ala vmware or Solaris hard zones. If you application can possibly use more resources you have to license at the highwater mark for resources consumed (eg Solaris soft zones are supposed to be tracked with an audit trail and paid at the most resources consumed). Before we found out that our DR licensing is gratis because we don't do live failover and test DR less than 6 times a year we had four dual core processors in the box but were using boot.ini parameters to only use 4 cores with the understanding that we would change it if the production box became unavailable.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    17. Re:BAD MATH by fuzzyfuzzyfungus · · Score: 1

      Interesting, thanks.

  6. enterprise storage by perlchild · · Score: 3, Insightful

    Storage has been the performance bottleneck for so long, it's a happy problem if you actually must increase the bus speeds/cpu processors/get faster memory on raid cards to keep up. Seems to me the article(or at least the summary) was written by someone hadn't been following enterprise storage for very long...

    1. Re:enterprise storage by Anonymous Coward · · Score: 1, Insightful

      Damn straight! IO has been the bottleneck for at least 40 years. SSD is slowly opening doors to a brighter future, but we're a long way from the realistic capacity needs for business. Although I've yet to see real benchmarking that is designed for hundreds of simultaneous tasks, all the figures I see are largely rubbish assuming the user does one or two things. How about testing them on web services like digg, or on company mail servers instead of fake throughput and "feel" tests?

    2. Re:enterprise storage by ZosX · · Score: 2, Interesting

      That's kind of what I was thinking too. When you really start pushing the 300mb/s sata gives its hard to find something to complain about. Most of my hard drives max out at like 60-100mb a second and even the 15,000k drives are not a great deal faster. Low latency, fast speeds, increased reliability. This could get interesting in the next few years. Heck why not just build a raid 0 controller into the logic card with a sata connection and break the ssd into a bunch of little chunks and raid 0 them all max performance right out of the box so you get the performance advantages of raid without the cost of a card and the waste of a slot? PCIe SSD is quite interesting too..........

    3. Re:enterprise storage by HockeyPuck · · Score: 4, Interesting

      Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.

      We've taken whole databases and loaded them into dedicated cache drives on the array, and surprise, no performance increase. DBAs and application writers have gotten so used to yelling, "Add Hardware! That they forgot how to optimize their applications and sql queries."

      If storage was the bottleneck, I wouldn't be loading up storage ports (FAs) with 10-15 servers. I find it funny that the only devices on my 10,000 port SAN that can sufficiently drive IO are media servers and the tape drives (LTO-4) that they push.

      If storage was the bottleneck there would be no oversubscription in the SAN or disk array. Let me know when you demand a single storage port per HBA, and I'm sure my EMC will take us all out to lunch.

      I have more data than you. :)

    4. Re:enterprise storage by Anonymous Coward · · Score: 4, Insightful

      Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.

      You do sound like you know what you're doing, but there is quite a difference between average utilization and peak utilization. I have some servers that average less than 5% usage on a daily basis, but will briefly max out the connection about 5-6 times per day. For some applications, more peak speed does matter.

    5. Re:enterprise storage by swb · · Score: 1

      In my experience, DBAs and their fellow travelers in the application group like to point their finger at SANs and virtualization and scream about performance, not because the performance isn't adequate but because SANs (and virtualization) threaten their little app/db server empire. When they no longer "need" the direct attached storage, their dedicated boxes get folded into the ESX clusters and they have to slink back into their cubicles and quit being server & networking dilettantes.

    6. Re:enterprise storage by jon3k · · Score: 1

      "Heck why not just build a raid 0 controller into the logic card with a sata connection and break the ssd into a bunch of little chunks and raid 0 them all"

      Cost mostly, you'd need tons of controllers, cache, etc. Plus you can already nearly saturate SATA 3G with any decent SSD (Intel, Vertex, etc) so it's kind of pointless. The new Vertex and Intel SSDs are benchmarking at 250MB/s. Not point it making them much faster until we have SATA 6G.

    7. Re:enterprise storage by Slippy. · · Score: 4, Insightful

      Sort of true, but not entirely accurate.

      Is the on-demand response slow? Stats lie. Stats mislead. Stats are only stats. The systems I'm monitoring would use more I/O if they could. Those basic read/write graphs are just the start. How's the latency? Any errors? Pathing setup good? Are the systems queuing i/o requests while waiting for i/o service response?

      And traffic is almost always bursty unless the link is maxed - you're checking out a nice graph of the maximums too, I hope? That average looks mighty deceiving when long periods are compressed. At an extreme over months or years, data points can be days. Overnight + workday could = 50%. No big deal on the average.

      I have a similiar usage situation on many systems, but the limits are generally still storage dependent issues like i/o latency (apps make a limited number of requests before requests start queuing), poorly grown storage (a few luns there, a few here, everything is suddenly slowing down due to striping in one over-subscribed drawer), and sometimes unexpected network latency on the SAN (switch bottlenecks on the path to the storage).

      Those graphs of i/o may look pitiful, but perhaps that's only because the poor servers can't get the data any faster.

      Older enterprise SAN units (even just 4 or 5 years ago) kinda suck performance wise. The specs are lies in the real world. A newer unit, newer drives, newer connects and just like a server, you'll be shocked. What'cha know, those 4Gb cards are good for 4Gb after all!

      Every year, there's a few changes and growth, just like in every other tech sector.

      --
      -- Life is good. Tastes like chicken.
    8. Re:enterprise storage by 7213 · · Score: 1

      Slippy you are spot on sir,

      Looking at your SAN utilization and seeing HBA throughput of next to nothing is not necessarily proof it's the app or db. As a storage admin, I'd love to say it's never our fault, but clearly if your spending all your time looking at the network and not the disk utilization itself, your looking in the wrong place. I agree that 8Gb and even 4Gb links for disk HBAs are usually way overkill (on average), but that often is due to the fact that the spindles on the backend (or the server itself) can't service the load being pushed by the app or db. I rarely even bother looking at perf stats on my switches, as I know they will be under 50% utilized. But when I see my disk response times going well into the double digits, I know that the switches are not at issue and we may need to address the disk layout.

      As per the death of raid, I also see it's day coming. I don't mean to sound like an IBM fanboi (I'm not) but there XIV product looks like they've got the right idea for anti-raid (i think HPs EVA does this to a lesser extent). Band as many cheap ol' SATA disks together as possible, and be mad paranoid about mirroring it. Need more capacity? no worries pop it in and we'll start moving data around to reduce access density automagicly. Teiring your storage is a bad idea that needs to die (mico-managment) as to lower your access density you end up short stroking. Instead put that low IO NAS device right next to your high IO financial system on disk and use both the capacity & IO ability of the drive effectively. I -personly- don't see the death of centralized storage coming anytime soon, it's just that 1) raid needs to die as we spread IO over more & more larger & slower per GB spindles, and 2) for the near term (5+ years) that SSD stuff is going to be used as a second level cache at best do to cost per GB. Either w/the disk array using it as a second stage cache, or intelligently written apps doing it for themselves (likely better for everyone).

      The 'cloud' is a lie for most enterprise class apps, at least when you get to the DB level. (p.s. I also get frustrated with the app & dba premadonas, but it's even WORSE on the rare occasion when they are right ;-) )

    9. Re:enterprise storage by perlchild · · Score: 1

      Well boy did I not expect this kind of reaction... I'm kinda on your side, really. I meant, here's someone that's saying that SSDs means you're no longer starving for spindles... And I say "well that's good, they were holding us back, we can do something better now, that's not a problem." On the other hand, it seems it's a lot more loaded politically in places that don't do this with just three admins, and no dedicated storage admins, so I'll just shut up now cuz I hate politics. You guys have a nice day.

    10. Re:enterprise storage by Anonymous Coward · · Score: 0

      In my experience, DBAs and their fellow travelers in the application group like to point their finger at SANs and virtualization and scream about performance, not because the performance isn't adequate but because SANs (and virtualization) threaten their little app/db server empire. When they no longer "need" the direct attached storage, their dedicated boxes get folded into the ESX clusters and they have to slink back into their cubicles and quit being server & networking dilettantes.

      amen!

    11. Re:enterprise storage by Chang · · Score: 1

      Or on a large VM cluster - which thousands of data centers have in production now.

    12. Re:enterprise storage by drsmithy · · Score: 1

      Heck why not just build a raid 0 controller into the logic card with a sata connection and break the ssd into a bunch of little chunks and raid 0 them all max performance right out of the box so you get the performance advantages of raid without the cost of a card and the waste of a slot?

      Because an error anywhere nukes the whole shebang.

    13. Re:enterprise storage by markk · · Score: 1

      Who cares about average use? The cost is driven by the PEAK use. That is why the average use for HBA's is almost nothing, but you are paying double the money or more because of the 8 hours a month you need to smoke. And woe betide the Architect who suggest postponing a business meeting for 48 hour every month so he can save $20 million a year. Seriously.

    14. Re:enterprise storage by natas · · Score: 1

      Which nagios plugin are you using to poll this. We are running into the same exact situation where I work. Oracle dba's screaming that its IO but our san guys do not see it.

    15. Re:enterprise storage by Gothmolly · · Score: 0, Flamebait

      Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed

      You sound gay.

      --
      I want to delete my account but Slashdot doesn't allow it.
    16. Re:enterprise storage by marcosdumay · · Score: 1

      Well, that was unexpected for me too. And you know, you are right. Real world applications behave quite differently from how academical models say they would, that is because the models didn't model teams limitations and the unavoidable mistakes (from the techies and from the HR) that add into some very siginificant amount on any project.

      Too bad I didn't let that academical misconception go yet. That is why I was surprized.

    17. Re:enterprise storage by Spit · · Score: 1

      What's the cost-benefit analysis of buying hardware that has headroom for those .1% peak events, vs data housekeeping and app/sql profiling? This is a management problem, not a technical one.

      --
      POKE 36879,8
    18. Re:enterprise storage by jon3k · · Score: 1

      "How about testing them on web services like digg, or on company mail servers instead of fake throughput and "feel" tests?"

      I've been waiting for the same thing, unfortunately SLC flashed-based drives (the more expensive NAND flash with the higher lifespan) is still exceptionally expensive. But, the good news is major SAN vendors are already offering SSD options. Everyone from EMC to Sun Microsystems is starting to include SSD drives in their storage products. While it would be very unusual for us to get a peek into the storage systems of companies like digg, etc, hopefully they'll filter down far enough that we can get some realistic reviews soon. I'm definitely looking forward to it.

    19. Re:enterprise storage by jon3k · · Score: 1

      "What's the cost-benefit analysis of buying hardware that has headroom for those .1% peak events, vs data housekeeping and app/sql profiling? This is a management problem, not a technical one."

      Depends on the business requirements and the number of end-points, but I wouldn't rule it out completely. For example, production companies moving large amounts of video for short periods of time, it might be worth the difference between 4Gb and 8Gb fiber channel, I don't know. You're also assuming those peak events are only 1%, they could be 5%, 10% or even more. I'm not saying it's always warranted, just that it may be in some cases.

    20. Re:enterprise storage by Decker-Mage · · Score: 1

      Which just so happens to be occupying quite a bit of my thought processes today. What I'm interested in is at least one SSD array with in-line de-duplication for the VM image files. While most shops are heterogenous in their collection of image files, there is still quite a bit of overlap. SSD's very nicely take care of the IOPS and streaming bottlenecks.

      --
      "[I]t is a wise man who admits the limits of his knowledge or skill, and that pretending either causes harm." --Terry Go
    21. Re:enterprise storage by ioshhdflwuegfh · · Score: 1

      Yeah, no shit.

    22. Re:enterprise storage by Anonymous Coward · · Score: 0

      mod parent up. people just forgot how to do software properly. it is a sound decision to throw hardware at a problem rather than people, only when it is less costly. I suspect that most SAN deployments cost >$10M, and do not save the same amount or more in proper software optimization.

    23. Re:enterprise storage by afidel · · Score: 1

      Put a Netapp filer in front of something like this, of course the filer would definitely be the limiting factor for IOPS in such a configuration as even the biggest ones aren't rated for near a million IOPS.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    24. Re:enterprise storage by afidel · · Score: 1

      Your Oracle guys will have more meaningful statistics like average and worst case service latency, to a DBA they don't care if they can only get 20% saturation on the link because the storage can't handle the random IOPS. I don't have much of a problem because average service time is right at average spindle speed and worst case when we aren't experiencing a fail-over event is ~2x average service time. We have also benchmarked the array and shown that it will do ~10x more 4k random IOPS then we see the DB push on an average day. After we did all that benchmarking and statistics gathering the DBA's went back and worked with the 3rd party application guys to do some serious tuning on the SQL they were generating resulting in some reports going from 5 minutes to under 15 seconds to run.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  7. Hardware RAID becoming less relevant every day. by Vellmont · · Score: 1, Insightful

    The first question is really, why RAID a SSD? It's already more reliable than a mechanical disk, so that argument goes out the window. You might get some increased performance, but that's often not a big factor.

    The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

    --
    AccountKiller
    1. Re:Hardware RAID becoming less relevant every day. by Anonymous Coward · · Score: 0

      You tell my boss that raid 1+0, 3, 5 or 6 is a waste of time! They sell ECC RAM for a reason as well. If a SSD craps out, which they WILL do (just look at reviews on newegg for proof of that) you'll need a RAID level with redundancy to fix it.

    2. Re:Hardware RAID becoming less relevant every day. by Anonymous Coward · · Score: 0

      General purpose CPUs are bad at calculating parity in comparison to dedicated hardware.

      There will still be a want for RAID as it provides a measure of data integrity without duplication of data. e.g RAID 5 you lose 1/4 of data space in parity instead of 1/2 in mirroring.

    3. Re:Hardware RAID becoming less relevant every day. by Alain+Williams · · Score: 2, Informative

      The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

      I much prefer s/ware raid (Linux kernel dm_mirror), it removes a complicated piece of h/ware which is just another thing to go wrong. It also means that you can see the real disks that make up the mirror and so monitor it with the smart tools.

      OK: if you do raid5 rather than mirroring (raid1) you might want a h/ware card to offload the work to, but for many systems a few terabyte disks are big and cheap enough to just mirror.

    4. Re:Hardware RAID becoming less relevant every day. by adisakp · · Score: 1

      The first question is really, why RAID a SSD? It's already more reliable than a mechanical disk, so that argument goes out the window. You might get some increased performance, but that's often not a big factor.

      The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

      RAID0 for Speed. SSD's in RAID0 can perform 2.5-3X faster than a single drive. A RAID SSD array can challenge the speed of a FusionIO card that is several thousand dollars.

      Now that the new faster 34nm Intel SSD's can be preordered for under $250, it's reasonable for an enthusiast to buy 3-4 of them and thrown them in a RAID0 array. Also, software (or built-in MB RAID) is fine -- a lot of the sites have shown that 3 SSD drives is the sweet point for price/performance using standard MB RAID controllers. If you want 4 or more, to see performance, you need a more $$$ separate controller card.

    5. Re:Hardware RAID becoming less relevant every day. by potHead42 · · Score: 1

      It also means that you can see the real disks that make up the mirror and so monitor it with the smart tools.

      With 3ware RAID controllers this is already possible, you just have to specify the magic device /dev/twa0 (for the first controller) and use the smartd/smartctl option "-d 3ware,0", where 0 specifies the disk number. I assume other controllers have something similar.

      But yeah, I also prefer software RAID, especially when using ZFS ;-)

    6. Re:Hardware RAID becoming less relevant every day. by Yert · · Score: 1

      Because I can't buy a 26TB SSD drive, but I can put 52 500GB SSD drives in two CoreRAID chassis and mount them as one filesystem...as opposed to the 2 Sun storage arrays we use now, that are fiber attached and starting to get a little ... slow. SSDs would give us 10x the IO overhead.

      --
      Truck driver, plumber, Linux systems engineer.
    7. Re:Hardware RAID becoming less relevant every day. by TerminaMorte · · Score: 1

      In enterprise, it doesn't matter if the disk has a less likely chance of failing; redundancy for HA is worth the extra cost.

      If someone is spending the money on SSD then performance had better be a big factor!

    8. Re:Hardware RAID becoming less relevant every day. by Rockoon · · Score: 1

      Don't forget about the "Battleship MTRON" guys that raided up 8 MTRON SSD's (the fastest SSD's at the time) several years ago and then had a lot of trouble actualy finding a raid controller than could handle the bandwidth. This years SSD's are twice as fast, and expect performance to double again within 12 months.

      --
      "His name was James Damore."
    9. Re:Hardware RAID becoming less relevant every day. by Anonymous Coward · · Score: 0

      > OK: if you do raid5 ... .. you deserve to be shot.

    10. Re:Hardware RAID becoming less relevant every day. by mysidia · · Score: 2, Informative

      Well, ZFS is great, but don't get that mixed up with software RAID. It's not. The storage redundancy algorithms used by ZFS are not the RAID algorithms, such that using ZFS is much better than using EITHER hardware or software RAID.

      ZFS provides performance and data integrity assurance that standard RAID does not. Primarily, because filesystem level data is checksummed, and it should be almost impossible for silent data corruption to occur at the storage device level, except cases where the data written actually matches the checksums, (a later 'zpool scrub' should detect it, if ZFS is implemented properly).

      But aside from ZFS, software RAID (and even fakeraid/hostraid hardware adapters that perform RAID in the driver) really really suck both in terms of reliability, data integrity, and performance when you need to push things to the maximum, compared to a good hardware RAID controller; software RAID is measurably slower on the same CPU and memory.

      SMART provides so little of what you need to be doing to keep a reliable array, it isn't even funny.

      Good hardware controllers keep metadata and do frequent consistency checks / "scrubs" / surface scans, to ensure every bit of data is periodically read from every drive, so HDD firmware has an opportunity to fix errors before they become "unrecoverable read errors".

      Hardware controllers will also detect when a hard drive is having a problem that cannot be easily identified by software. Hard drives are direcly plugged into the controller; it can detect things such as abnormal command response latencies.

      A software controller can't be sure the abnormal latency isn't due to other workload on the bus, or "not a drive failing", so the HW controller is more responsive to failure.

      HW contollers also provide writethrough caching, and sometimes have a BBU with a full writeback cache, which drastically helps performance, and reduces the RAID performance penalty, which software RAID doesn't mitigate, but in fact makes worse.

      Oh yes, and Good controllers also have monitoring and administration tools for various OSes, including Linux, Windows, and Solaris, produced by the manufacturer.

      Many of the good controllers come equipped with audible alarms and terminals for you to plug drive failure LEDs into, so that anyone near the server can know a drive has failed, and which one.

    11. Re:Hardware RAID becoming less relevant every day. by Helmholtz · · Score: 1

      This is where ZFS has some potential to become even more important than it already is.

      The reason you RAID a SSD is to protect against silent data corruption, which SSDs are not immune from. While you don't necessarily need RAID for this with ZFS, it certainly makes it easier.

      The point about the insane abundance of CPU power is one that ZFS specifically takes advantage of right out of the starting gate.

      --
      RFC2119
    12. Re:Hardware RAID becoming less relevant every day. by jon3k · · Score: 1

      I'll take 4 drives in RAID10 please :)

    13. Re:Hardware RAID becoming less relevant every day. by darkjedi521 · · Score: 1

      I have data sets spanning multiple terabytes. One recent PhD graduate in the lab I support accumulated 20 TB of results during his time here. Even if I had highly reliable SSDs that never failed, I'd still toss the SSDs together in a zpool to get the capacities I need to accommodate a single data set. RAID is not just about redundancy. With SSDs, I'd probably use RAID5 instead of RAID6 just in case I had a freak bad drive, but RAID in some form is here to stay.

    14. Re:Hardware RAID becoming less relevant every day. by PiSkyHi · · Score: 1

      You've added a piece of hardware to do RAID, which may have more bells and whistles, but all I really want is an email when a drive fails. The drive is nothing compared to the data, so all I need is a controller that supports hotplug. Silicon Image make one that should be standard on most MBs but isn't yet because of hardware RAID being an industry trying to stay alive.

      If the controller fails, for Hardware RAID, I'm looking at wasting time and a lot of cash to get that data back online. For software RAID, a controller is a no brainer.

      performance ? dedicate a cheap PC to the array, you can always change your mind with software.

      Getting firmware that can merely read and write a drive at some stage in the future is always going to be easier than managing the application level RAID management software updates.

      Seriously, the email is enough of bells and whistles for a storage array, I hope no one in your work area has to sit near the actual thing.

    15. Re:Hardware RAID becoming less relevant every day. by duguk · · Score: 1

      > OK: if you do raid5 ... .. you deserve to be shot.

      There's nothing wrong with RAID5 in the right circumstances (large home server?), but if you use it instead of a backup you deserve to be shot.

    16. Re:Hardware RAID becoming less relevant every day. by mysidia · · Score: 1

      The chance of a controller failing is almost negligible; it's similar to the chance of a NIC or CPU failing: hard drives fail much more often. If you stick with a standard common controller type for all your servers, eg use all HP DL3xx or Dell PE 29xx servers with embedded controllers (for example), getting a spare should be easy, cheaper than the offline spare HD you should be keeping to restore redundancy to the array, and the broken controller should be covered by warranty.

      Well, servers belong on server racks in closed rooms; they're so loud, that if someone had to sit near it, the fan noise from all the servers would be overwhelming.

      And just getting an e-mail has the problem of not identifying precisely which drive has failed.

      If you call up the datacenter tech (remote hands) and tell them to swap out the drive with the spare, there's a chance they'll accidentally pull the wrong drive, or pull the right drive from the wrong server.

      Visible indications tend to be pretty useful in avoiding mistakes, and it's a good idea to take every reasonable precaution in assuring mistakes don't happen, if the server uptime is important: if it's not, why use RAID? Just swap the drive and load the backup, that procedure is a lot more reliable than hot plugging.

    17. Re:Hardware RAID becoming less relevant every day. by drsmithy · · Score: 1

      The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

      Transparency and simplicity. It's a lot easier dealing with a single device than a dozen.

    18. Re:Hardware RAID becoming less relevant every day. by amRadioHed · · Score: 1

      That's only true to a point. If the reliability of the SSD gets to the point where it's about as likely as the RAID controller to fail, then the RAID controller is just an extra point of failure that will not increase your availability at all. However, AFAIK SSDs aren't that reliable yet so the RAID controllers are still worth it.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    19. Re:Hardware RAID becoming less relevant every day. by dbIII · · Score: 1

      RAID 5 is so that people can keep on working after a disk failure instead of having to wait for you to restore it all from backup. With a hot spare and a decent controller all they will notice is a slowdown while it's rebuilding the array onto the new disk.
      Of course if you lose two drives everyone has to wait for you to restore it all from backup. There's also RAID6, but it just gives you a bit more leeway in the number of drives you can use.

    20. Re:Hardware RAID becoming less relevant every day. by PiSkyHi · · Score: 1

      Identifying the drive is not an issue - just check the sysmtem log - if you set it up correctly, the software RAID will have labels on drives - I label both physically and virtually. By virtually, I mean a fake partition with the drive number as partition type.

      A label is as good as a lamp, using a hotswap drive bay.

      Hot-plugging is all you need in terms of hardware for RAID support - it means that the system won't go down because of a drive failure, and the email will get through to you. Of course, when I say software RAID, I mean Linux software RAID, not Motherboard Software RAID.

      Software RAID 5 is going to take around 10% of a modern CPU core, but it will ruin the cache, so a dedicated fileserver makes sense - if you need more speed than one PC can deliver, then another PC with software load-sharing is a better option than souping up hardware anyway, as you will probably hit other limits when using many disks regardless of your choice of array.

    21. Re:Hardware RAID becoming less relevant every day. by mysidia · · Score: 1

      I've seen way too many cases with SW RAID5 in particular, where there are 5 drives.

      Drive /dev/sdb fails, but noone has a clue which physical drive /dev/sdb actually is; which MB SATA port is the "second" one, or even, that the OS may have for some reason re-ordered the drives, so what was /dev/sda last boot is /dev/sdb this boot, and to check dmesg and files in /sys for the SCSI IDs of the volumes when plugging them in, and after failure, to see the true drive.

      It may seem like it's easy to tell people to just label everything, and test that the labels match, especially after any changes.

      But in practice, people don't, when setting up their servers, your average Linux user tends to just plug everything in, slap on the SW RAID OS install, and send the server straight into production, with minimal labelling of anything; most people won't even label the Ethernet ports Eth0, Eth1, Eth2, Eth3, if the server vendor didn't print numbers on them.

      The first thing you need to realize, when it comes to designing reliable server configurations: people fail more often than hardware does.

      If there's a way the cabling to the MB SATA ports could somehow get messed up, so the physical port labelled "Port 1" is actually "Port 2" and the one labelled "Port 2" is actually port 1, it will eventually happen, it will be messed up at the worst possible time, on the most important server.

    22. Re:Hardware RAID becoming less relevant every day. by jon3k · · Score: 0, Troll

      "f a SSD craps out, which they WILL do (just look at reviews on newegg for proof of that) "

      Intel offers a 3 year warranty on their new drives, most other vendors offer 1 (OCZ, Samsung, etc). I also know of at least 20 SSDs used by friends and family and I've yet to hear of one fail. So, unless you can provide some actual evidence - shut the fuck up.

    23. Re:Hardware RAID becoming less relevant every day. by jon3k · · Score: 1

      "However, AFAIK SSDs aren't that reliable yet so the RAID controllers are still worth it."

      Please stop spreading baseless FUD.

    24. Re:Hardware RAID becoming less relevant every day. by PiSkyHi · · Score: 1

      OK, I agree that people failing is going to create more problems. If you are paid to take of hardware, then it depends on how competent you think you are.

      The first RAID box I had to deal with was a Dell PE1950, it had Linux support via Matt Domsch, it was reliable, but the 3 drive RAID5 had a throughput of 9Mb/sec. At that stage, getting the management software to run meant using a particular version of Redhat, and I am a Debian user.

      Now. software RAID I can control fully with a hot-swap SATA controller and if you know what you're doing, its very simple. Install each drive one by one via a hot swap bay and label it with a sticky label and fdisk - it doesn't matter what order they come up in after that, you can always find the right drive that is no longer present by using "fdisk -l". No need for cable management, or slot management, plus you can place a boot sector on more than 1 drive to ensure it always boots. The only way you won't receive your email is if the network fails or 2 drives go at the same time, same as hardware RAID, only setting up an email with a hardware RAID controller is more of a pain than installing the version of Redhat they expect you to use.

      I can appreciate that for a manager who doesn't have a competent tech around, hardware RAID is the way, but for a tech, hardware RAID is now redundant, expensive and limiting on your choice of OS to manage the unit remotely.

    25. Re:Hardware RAID becoming less relevant every day. by PiSkyHi · · Score: 1

      one more thing... I was given responsibility of an IBM E-Server a few years ago. It tried to help me by indicating a drive was failing prior to it failing.

      It did this by refusing to boot up until someone had pressed a key...

      ... on the CONSOLE!

      So, your point about some hardware controllers being better than others is a major one, considering the outlay to find out.

    26. Re:Hardware RAID becoming less relevant every day. by amRadioHed · · Score: 1

      What FUD? I said AFAIK. I haven't been following them closely. If I'm wrong please feel free to correct me instead of jumping down my throat.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    27. Re:Hardware RAID becoming less relevant every day. by mysidia · · Score: 1

      So what? Some vendors offer 5 year warranties for mechanical drives too.

      SSDs do fail, just in different ways than mechanical drives do. In addition, all SSD flash eventually does fail in a specific predictable way after a large number of read/write cycles.

      Entire flash cells can die, and this is much more likely to happen than a failure of other components in a server. It does make sense to RAID1 SSD drives, if you need reliability.

    28. Re:Hardware RAID becoming less relevant every day. by Anonymous Coward · · Score: 0

      Terrabytes? What kind of drives are you talking about low performance SATA? If you are using SATA, you don't really care that much about performance, so its pretty mute. High performance 15k SAS on the other hand comes in much smaller capacity sizes.

    29. Re:Hardware RAID becoming less relevant every day. by badkarmadayaccount · · Score: 1
      Actually, the formula for RAID 5 is

      [UsableCapacity] = ( [#ofdrives]-1 ) * [TotalCapacity] / [#ofdrives]

      Also, I'd say the best solution would be a hybrid software/hardware RAID, where the parity calculations are offloaded to algorithmically specialized, but general purpose hardware (i.e. GPGPU). Just my $0.02.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    30. Re:Hardware RAID becoming less relevant every day. by badkarmadayaccount · · Score: 1

      Yeah, there are lots of cycles, but cache and bandwidth are lacking. You would be better off offloading the stuff on the GPU. I wonder what the Galium3D guys are doing these days...

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    31. Re:Hardware RAID becoming less relevant every day. by jon3k · · Score: 1

      "FUD" is fear, uncertainty and doubt. By saying unproven, un-researched things like this you continue to promote a commonly held misconception about solid state drives. It's like two women at the hair salon gossiping, stop it. Either research it and come up with a useful contribution to the thread or just don't reply.

    32. Re:Hardware RAID becoming less relevant every day. by amRadioHed · · Score: 1

      How about you come up with a useful contribution to the thread or not reply? Yelling FUD isn't useful. If you have information about how the reliability of SSDs compares to RAID controllers feel free to provide it. As it is you've contributed nothing.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    33. Re:Hardware RAID becoming less relevant every day. by jon3k · · Score: 1

      My contribution is to point out we don't have any long term data on the reliability of SSDs as opposed to saying things like "As Far As I Know" when you know absolutely nothing.

    34. Re:Hardware RAID becoming less relevant every day. by amRadioHed · · Score: 1

      Wow. Thanks for the wasted time.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
  8. Real DC by Anonymous Coward · · Score: 0

    Run FiberChannel

  9. iscsi, 10gig by Colin+Smith · · Score: 1

    Multiple interfaces and lots of block servers.

    Does anyone actually still use NFS?

     

    --
    Deleted
    1. Re:iscsi, 10gig by Anonymous Coward · · Score: 2, Informative

      Of course. NFS provides an easy to use concurrent shared filesystem that doesn't require any cluster overhead or complication like GFS or GPFS.

    2. Re:iscsi, 10gig by drsmithy · · Score: 2, Informative

      Does anyone actually still use NFS?

      Of course. It's nearly always fast enough, trivially simple to setup, and doesn't need complicated and fragile clustering software so that multiple systems can access the same disk space.

    3. Re:iscsi, 10gig by dstar · · Score: 1

      Where I work, we've only got a few petabytes of NFS storage. And it's only used for mission critical (in the literal meaning of the term -- no access to data, no work gets done, literally $millions lost if a deadline is blown) data.

      NetApp doesn't seem to be having any trouble selling NFS, either.

      So no, I don't think anyone uses NFS anymore.

    4. Re:iscsi, 10gig by guruevi · · Score: 1

      Not everybody (hardly anyone) needs a single block device in a work environment. You might as well hang the hard drive in their systems if that's all you need, cheaper, faster and simpler. Also block devices don't separate very well. You have to assign and reserve a certain block of data no matter whether it's used.

      NFS is much more granular that way, you put everything on a large block device, give it some permissions and you're good to go. Also for shared data, sharing block devices might not be a good idea because of locking issues etc. NFS handles this much more elegant.

      Besides, there is not really an alternative to NFS these days. SMB is still too slow and uses too much resources, other protocols are os or vendor specific or require at least the loading of drivers or kernel modules. And they all offer service similar to NFS. It's not encrypted, just packaged differently and hopefully it has some extra features but with a decent NFS setup this can usually get squared away.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
  10. Not quite by greg1104 · · Score: 3, Informative

    There may need to be some minor rethinking of controller throughput for read applications on smaller data sets for SSD. But right now, I regularly saturate the controller or bus when running sequential RW tests against a large number of physical drives in a RAID{1}0 array, so it's not like that's anything new. Using SSD just makes it more likely that will happen even on random workloads.

    There are two major problems with this analysis though. The first is that it presumes SSD will be large enough for the sorts of workloads people with RAID controllers encounter. While there are certainly people using such controllers to accelerate small data sets, you'll find just as many people who are using RAID to handle large amounts of data. Right now, if you've got terabytes of stuff, it's just not practical to use SSD yet. For example, I do database work for living, and the only place we're using SSD right now is for holding indexes. None of the data can fit, and the data growth volume is such that I don't even expect SSDs to ever catch up--hard drives are just keeping up with the pace of data growth.

    The second problem is that SSDs rely on volatile write caches in order to achieve their stated write performance, which is just plain not acceptable for enterprise applications where honoring fsync is important, like all database ones. You end up with disk corruption if there's a crash, and as you can see in that article once everything was switched to only relying on non-volatile cache the performance of the SSD wasn't that much better than the RAID 10 system under test. The write IOPS claims of Intel's SSD products are garbage if you care about honoring write guarantees, which means it's not that hard to keep with them after all on the write side in a serious application.

    1. Re:Not quite by A+beautiful+mind · · Score: 2, Insightful

      The second problem is that SSDs rely on volatile write caches in order to achieve their stated write performance, which is just plain not acceptable for enterprise applications where honoring fsync is important, like all database ones. You end up with disk corruption if there's a crash, and as you can see in that article once everything was switched to only relying on non-volatile cache the performance of the SSD wasn't that much better than the RAID 10 system under test. The write IOPS claims of Intel's SSD products are garbage if you care about honoring write guarantees, which means it's not that hard to keep with them after all on the write side in a serious application.

      Most enterprise level SSDs have BBWC already for exactly that reason. On those systems fsync is a noop. I for one am looking forward to SSDs in enterprise level applications, we could easily consolidate current database servers that are IOPS bottlenecked, with very low levels of CPU and non-caching memory utilization. BBWC solves the "oh, but we need to honour fsync" kind of problems. We're looking at a performance increase of 10-20x (IOPS) easily if >500G enterprise level SSDs become available for database servers. Even if prices/GB stay way above SAN prices, it's still more than worth it to switch.

      --
      It takes a man to suffer ignorance and smile
      Be yourself no matter what they say
    2. Re:Not quite by Anonymous Coward · · Score: 0

      Parent has it right. There are good things to use SSD for (I can't talk about them), but flat out replacement of RAID arrays aren't one of them.

    3. Re:Not quite by greg1104 · · Score: 2, Insightful

      You can't turn fsync into a complete noop just by putting a cache in the middle. A fsync call on the OS side that forces that write out to cache will block if the BBWC is full for example, and if the underlying device can't write fast enough without its own cache being turned on you'll still be in trouble.

      While the cache in the middle will improve the situation by coalescing writes into the form the SSD can handle efficiently, the published SSD write IOPS numbers are still quite inflated relative to what you'll actually see. What I was trying to suggest is that the performance gap isn't nearly as large as suggested by the article of TFA once you start building real-world systems around them. After all, regular discs benefit from the write combining to lower seeks you get out of a BBWC, too, even more than the SSDs do.

      The other funny thing you discover if you benchmark enough of these things is that a regular hard drive confined to only use as much space as a SSD provides is quite a bit faster too. When you limit a 500GB SATA drive to only use 64GB (a standard bit of short stroking), there's a big improvement in sequential and seek speeds there. If you want to be fair, you should only compare your hard drive's IOPS when it's configured to only provide as much space as the SSD you're comparing against.

    4. Re:Not quite by AllynM · · Score: 1

      First a quick clarification: Intel X25 series SSDs do not use their RAM as a data writeback cache. Intel ships racks full of both M and E series drives, with those drives living in a RAID configuration. They couldn't pull that off if the array was corrupted on power loss. The competition had to start using large caches to reduce write stutters and increase random write performance, mostly in an attempt to catch up to Intel.

      The parent article is a bit 'off' as far as bandwidth vs. IOPS on RAID controllers. You can saturate even the best PCI-e RAID cards with only spinning disks. I'm currently pegging an Areca with 10 1TB 5400 RPM drives. The ultimate bandwidth is not limited by bus speed - it is the speed of the internal data pipelines within the card itself. I have yet to see a RAID card pull anywhere close to the theoretical 2 GB/sec possible over PCI-e x8. The 24-drive crazy Samsung RAID video that's floating around required three different RAIDs going in parallel to hit 2 GB/sec.

      What people also need to realize is that high end RAID cards were built around a theory of using a large cache and a dedicated processor to handle XOR calculations for RAID-5 and 6. Even the best performing cards will, at best, perform on-par with a high IOPS SSD like an X25 series.

      The parent article also speaks briefly of Native Command Queuing, hinting that it is not implemented in RAID cards. This is flat out wrong:

      1. Only very high end cards properly implement NCQ at the host and drive level (i.e. Areca):
      http://www.pcper.com/article.php?aid=695&type=expert&pid=6

      2. Only some SSDs implement NCQ beyond a queue depth of about 4 (i.e. Intel).
      http://www.pcper.com/article.php?aid=750&type=expert&pid=8

      The *real* reason even the best RAID hardware does not scale properly with SSD usage is the fact that a good RAID card has an upper IOPS limit matching just *one* SSD. Adding more SSDs only increases throughput, and it takes roughly half the number of SSDs to saturate a given controller (as compared to using HDDs).

      The parent article heavily confuses 'streaming' with 'IOPS'. A given RAID card can 'stream' just as well with either HDDs or SSDs. Where 'IOPS' comes into the equation is how far your average throughput drops as those requests become more random in nature. Random accesses cause the RAID controller to have to juggle more data. Here is an example: Placing an X25-M G2 behind an Areca RAID card will result in a *reduction* in IOPS, but no change in sequential throughput. The RAID card processor simply can't juggle the commands as fast as if that same X25-M G2 was connected to the motherboard controller directly. With a single SSD outmaneuvering the RAID controller, adding more SSDs only helps the RAID scale in sequential throughput, not IOPS.

      For SSDs to behave properly behind a RAID, the entire RAID process needs to be rethought. You don't need a bunch of writeback cache and a bulky controller architecture. You need a very lightweight XOR engine with *no* cache. The best example of this is creating a RAID of SSDs on an Intel ICH-10R controller. IOPS scales beautifully. 3 or 4 X25s on an ICH-10R will even outmaneuver an ioDrive, and gives several times the IOPS performance of any RAID card.

      Allyn Malventano
      Storage Editor, PC Perspective

      --
      this sig was brought to you by the letter /.
    5. Re:Not quite by Anonymous Coward · · Score: 1, Informative

      First a quick clarification: Intel X25 series SSDs do not use their RAM as a data writeback cache. Intel ships racks full of both M and E series drives, with those drives living in a RAID configuration. They couldn't pull that off if the array was corrupted on power loss.

      While it would be nice if this were true, since Intel's FAQ references a write cache and database-oriented tests like the one I referenced show data corruption, the paranoid (which includes everyone who works on database and similar enterprise apps) have to presume there's still a problem until some trustworthy studies to the contrary appear. Please let me know if you're aware of any. Your argument of "they couldn't pull that off" is not a data point, because millions of hard drives with a lying write cache are shipped every year to people who think they're just fine, and who don't experience corruption on power loss. Those same drives show corruption just fine if you do a database-oriented corruption test on them.

      Until I see SSD vendors giving very clear statements about their write caching and they start passing tests specifically aimed at discovering this type of corruption, you have to assume that the situation with them is just as bad as it's always been with regular IDE or SATA disks--drives lie. The only such test I've seen so far using the Intel drives is from Vadim, the X25-E failed. It would be great if the coverage you were doing at PC Perspective, expanded to cover this issue fully; write-cache enabled?, diskchecker.pl, and faking the sync have good introductions to this issue and how to run such tests yourself.

    6. Re:Not quite by jon3k · · Score: 1

      "The other funny thing you discover if you benchmark enough of these things is that a regular hard drive confined to only use as much space as a SSD provides is quite a bit faster too."

      Yes there's an improvement, but to compare read IOPS from an enterprise SSD to a short-stroked SATA disk on a purely performance basis isn't even close. We're talking orders of magnitude slower.

      I think SSDs really shine when you get into situations where your performance requirements vastly outweigh our capacity requirements. When you need 100k IOPS for 10TB (in a predominately read heavy workload) and you're short stroking 500 HDDs all the sudden a few shelves of SSDs start looking mighty attractive (failure rate, power, management -- cost/sqft).

    7. Re:Not quite by owlstead · · Score: 1

      There are two major problems with this analysis though. The first is that it presumes SSD will be large enough for the sorts of workloads people with RAID controllers encounter. While there are certainly people using such controllers to accelerate small data sets, you'll find just as many people who are using RAID to handle large amounts of data. Right now, if you've got terabytes of stuff, it's just not practical to use SSD yet. For example, I do database work for living, and the only place we're using SSD right now is for holding indexes.

      That's probably true for your databases, but are databases that measure in terabytes really the norm?

      None of the data can fit, and the data growth volume is such that I don't even expect SSDs to ever catch up--hard drives are just keeping up with the pace of data growth.

      The latest SSD drives of Intel already has room for 320 GB. These are low end consumer disks. Once these things get popular you'll see a sharp increase in production volume. The growth *rate* of flash SSD is very, very high. They haven't caught up yet but I'm quite sure that they will, if only because the hard disks only seem to have these three advantages (size, price and many years of experience with them).

      The problem with the volatile write caches seems to be debunked as well, so I'm not so sure about your comment.

      For my personal use, the Intel G2 SSD that I ordered does 80 MB/s writes continuously. For me this means that if my PC shuts down suddenly, the chances of there being any dirty data in the cache is really low - much better than with a hard-disk anyway. But in my home environment it's likely that 99% of the time, there is no data to be written, which is totally different than in a high volume DB environment.

    8. Re:Not quite by Anonymous Coward · · Score: 0

      I think you're misunderstanding exactly where the first "B" resides in the parent's reference to BBWC. Enterprise level SSD's have either a battery or a supercap onboard the drive itself, effectively turning the on-disk volatile write cache into a non-volatile write cache. In that case, fsync / synchronize_cache is definitely a noop.

      These drives are taking a bit longer to come out (enterprise drive qualification is a b*tch), and they cost a LOT more per GB than the consumer grade drives we're talking about here, but they are capable of sustaining the published IOps numbers while still maintaining data integrity in the event of a power loss.

  11. All wrong. by sirwired · · Score: 2, Informative

    1) Most high-end RAID controllers aren't used for file serving. They are used to serve databases. Changes in filesystem technology don't affect them one bit, as most of the storage allocation decisions are made by the database.
    2) Assuming that a SSD controller that can pump 55k IOPS w/ 512B I/O's can do the same w/ 4K I/O's is stupid and probably wrong. That is Cringely math; could this guy possibly be as lame?
    3) The databases high-end RAID arrays get mostly used for do not now, and never have, used much bandwidth. They aren't going to magically do so just because the underlying disks (which the front-end server never even sees) can now handle more IOPS.

    All SSD's do is flip the Capacity/IOPS equation on the back end. Before, you ran out of drive IOPS before ran out of capacity. Now, you get to run out of capacity before you run out of IOPS on the drive side.

    Even if you have sufficient capacity (due to the rapid increase in SSD capacity), you are still going to run out of IOPS capacity on the RAID controller before you run out of IOPS or bandwidth on the drives. The RAID controller still has a lot of work to do with each I/O, and that isn't going to change just because the back-end drives are now more capable.

    SirWired

    1. Re:All wrong. by jon3k · · Score: 1

      "All SSD's do is flip the Capacity/IOPS equation on the back end. Before, you ran out of drive IOPS before ran out of capacity. Now, you get to run out of capacity before you run out of IOPS on the drive side."

      Thank you so much for summarizing that point so succinctly, I'm stealing that line, hope you don't mind :)

    2. Re:All wrong. by AllynM · · Score: 2, Interesting

      Well said. I've found using an ICH-10R kills that overhead, and I have seen excellent IOPS scaling with SSDs right on the motherboard controller. I've hit over 190k IOPS (single sector random read) with queue depth at 32, using only 4 X25-M G1 units. The only catch is the ICH-10R maxes out at about 650-700 MB/sec on throughput.

      Allyn Malventano
      Storage Editor, PC Perspective

      --
      this sig was brought to you by the letter /.
    3. Re:All wrong. by mysidia · · Score: 1

      1) Most high-end RAID controllers aren't used for file serving. They are used to serve databases. Changes in filesystem technology don't affect them one bit, as most of the storage allocation decisions are made by the database.

      I don't think that's right.

      What about block copy-on-write filesystems like ZFS that essentially convert random writes into sequential writes but result in essentially random layout on the filesystem?

      Meaning eventually lots of random reads for searches, after the data's been updated. But the disk access pattern is drastically different than it would be with a more traditional fs such as ext2.

      The characteristic absence of random writes, and the efficiency improvements that can be achieved with massive read caching, can have drastic impacts on DB disk performance.

    4. Re:All wrong. by sirwired · · Score: 1

      For a file server, sure ZFS is a great solution, because most of the data just sits there and is never modified. NetApp has used copy-on-write for years in WAFL for this reason.

      But the writers of databases are not morons and techniques such as copy-on-write are not new; the DB's already do what they can to optimize how writes are committed to the database. They don't need the help of a filesystem to optimize this process, as the possible optimizations, have already been made. If random writes were a problem before with a given database, they are still going to be a problem.

      Most of the whitepapers on ZFS w/ databases show essentially no major performance difference between Direct I/O, UFS or ZFS with databases of any significant size.

      SirWired

    5. Re:All wrong. by afidel · · Score: 1

      So we need something that can keep up with an ICH-10R but can do RAID5/10 and has BBWC. For $500-1500 someone should be able to do that no problem, it just hasn't been a priority before because you couldn't get enough IOPS out of attached devices for the controller to be the limiting factor (16x15k SFF's still can't do as many 4k random IOPS as a single x-25e). Btw I've found the same thing but using the ICH8R, it can smoke any of the RAID controllers I have available for pure IOPS when coupled with the x-25e.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  12. Mod Parent Up by sirwired · · Score: 1

    The fact that SSD perf drops like a rock when you actually need to be absolutely sure the data makes it to disk is huge factor in enterprise storage. No enterprise storage customer is going to accept the possibility their data goes down the bit-bucket just because somebody tripped over the power cord. Enterprise databases are built around the idea that when the storage stack says data has been written, it has, in fact, been written. Storage vendors spend a great deal of money, effort, and complexity guaranteeing the non-volatility of write cache; for SSD to ignore that requirement when publishing performance data is fundamentally dishonest.

    SirWired

  13. the on board chips are not build for high speed / by Joe+The+Dragon · · Score: 1

    the on board chips are not build for high speed / useing all the ports at the max at one time.

  14. No matter how fat you make the pipe by countertrolling · · Score: 1

    Somebody will find a way to clog it up.

    Where's our "paperless" society?

    --
    For justice, we must go to Don Corleone
  15. The next bottleneck? by Sjefsmurf · · Score: 1
    Nothing new here.

    Anyone seriously into benchmarking or high performance applications would know that raid controllers has been a bigger bottleneck than the harddrives for ages already.

    It's just the last 2-3 years or so that you have gotten raid controllers fast enough to properly deal with the performance of the 6 tp 8 15k rpm drives that a normal 2U server can hold, and still today, many of the server raid cards out there still cannot do this.

    Raid card performance has easily been the biggest differentiator on server performance for anyone that needed a reasonable amount of I/O capacity on their servers. Most servers have been reasonably equal in terms of memory and network performance. After all, they are all built around a very limited number of CPU and chipset architectures and there is only so much that can different there and it is a long time since gigabit network HW for server did not manage to fill a gigabit link.

    Raid cards on the other side has major differences in architecture, software and processors. Proper HW raids are basically small computers on a card. They got their own CPU, memory and OS. This isolate them from the host they are plugged into and protect the data even if the host crash (the raid card will normally not crash and has all data in the battery backed up cache, which means a great deal for critical data and massively reduces the chances that you need to do consistency checks/validation and rebuilds after a crash on the host server which is equally important for a server).

    Unfortunately as a result of all that extra complexity, you also get the potential for large performance variations between different raid cards.

    When that is said, a good quality raid card now definately help on performance in most scenarios and easily outperforms software raids for most server usage that includes a reasonable amount of writing as long as you got that battery backup on your cache so you can safely enable write back caching.

    Just do your homework and make sure you get a good card when you shop. The better cards can easily be 2-3x faster than the worst.

    1. Re:The next bottleneck? by PiSkyHi · · Score: 1

      If you do a cost comparison, the software RAID beats Hardware, because instead of buying a fast RAID card, you've got more storage space and speed.

  16. Re:Wait. You mean my SAN is Dead? by jon3k · · Score: 1

    Cost per IOPS yes, several vendors are selling SSD now. Cost per terrabyte, no, SSD isn't even close. What we're seeing now is a Tier 0 storage using SSD's. It fits in between RAM cache in SAN controller nodes and on-line storage (super fast, typically fiber channel storage vs near line).

    So previously it looked like (slowest to fastest): SATA (near-line), Fiber Channel (online) -> RAM cache

    Now we'll have: SATA -> FC -> SSD -> RAM

    And in a few years after the technology gets better and much less expensive, we'll see: SATA -> SSD -> RAM

    And hopefully eventually: SSD -> Memristors :)

  17. SSD killed the Raid(io) star by Latinhypercube · · Score: 1

    SSD killed the Raid(io) star. Really, who needs the fuss of raid. Unless it's for backup, there is no need for raid as far as speed goes. SSD are already bottlenecking the 3.0Gb/s SATA II. A single SSD can produce the same throughput as 4 raided Raptors (=fast drives). Plus anyone can install and SSD into an existing setup, Raid requires a lot of reinstalls and drivers etc..

    1. Re:SSD killed the Raid(io) star by Rockoon · · Score: 1

      The 3Gb/sec rating is on each individual port, not on the bridge between the SATA controller(s) and the CPU. THe bridge between the controller and CPU can theoretically max out the system bus (We measure that in GB/sec instead of Gb/sec.) There are plenty of SATA controllers that push well over 3Gb/sec towards the system, it is only that each individual SATA300 device is capped at 3Gb/sec.

      --
      "His name was James Damore."
    2. Re:SSD killed the Raid(io) star by Anonymous Coward · · Score: 0

      ... 4 raided Raptors (=fast drives)...

      Raptors? Fast? In which universe?

      Go back to your basement and play with your toy computers. We're talking real hardware here, son.

    3. Re:SSD killed the Raid(io) star by dbIII · · Score: 1

      RAID is not a backup. It's a way to have very big volume sizes or to be able to keep going if you lose a disk or two.

  18. Colo Datacenters VS Real Datacenters by Anonymous Coward · · Score: 0

    Not sure what Datacenters you have been visiting but it sounds like you need to get out more. In a standard Colocation datacenter you see a lot of data that lives in midgrade x86 server raid subsystems. You also see a lot of bakers racks filled with crap white box systems.

    In a real datacenter the only raid seen is a raid 1 for the boot drives to get the server up into the operating system. The data lives on the SAN. If the server suffers a hardware failure or other problem the admin is able to assign the LUN to another server and get the application back up during the repairs. Clustered applications are even able to do this on their own and page the admin and let him know its time for a service call.

    And of course you mention nothing about ZFS which is even able to judge the read and write speed of its devices. A raidz configuration of a mix between regular spindles and ssd's would be able to balance between the two depending on the needs of the operations involved.

    SSD's won't be in the datacenter for a long time. The 15k rpm fibre channel drives found in most EMC hardware is robust and scary fast on top of being extremely fault tolerant with BCV's and multiple LUN's. I wonder how well SSD's would do in a real world test of multiple LUN configuration with 24 hour hammering on the other end of 5000 hosts on an 8gb fibre?

    1. Re:Colo Datacenters VS Real Datacenters by dstar · · Score: 1

      In a real datacenter the only raid seen is a raid 1 for the boot drives to get the server up into the operating system. The data lives on the SAN.

      Hint: Do you think that's a raw drive you're seeing? No, you're seeing... a RAID5 volume presented as a drive by the array.

      Not having RAID is simply not feasible in a 'real' datacenter, because you'll lose a disk or two each week -- if not day.

      But then, what would I know -- I only work on a team handling several petabytes of space, having come from a team handling several *more* petabytes.

    2. Re:Colo Datacenters VS Real Datacenters by mysidia · · Score: 1

      In a real datacenter the only raid seen is a raid 1 for the boot drives to get the server up into the operating system. The data lives on the SAN.

      In real datacenters, the servers boot from SAN and there are no hard drives in them. If a server fails, a new one can be brought up in seconds, by cluster management software that automatically takes care of shooting the other node in the head, re-mapping all the LUN masking to expose the disks to the spare server's FC port WWNs, placing the spare server into the right VLANs, and booting it up. The admin doesn't need to be notified, except as a friendly reminder to come pick up the dead server, to send it back to the manufacturer for warranty service.

  19. The Real Answer... by billybob_jcv · · Score: 1

    ...Our EMC sales rep has been putting the hard sell on us to buy some SSD product. I think they are worried about their profit margins on conventional drives, and they want to move customers to a product with a higher margin - and along the way they can also try to get you to upgrade head units, etc.

    1. Re:The Real Answer... by jon3k · · Score: 1

      Higher margins? Are you kidding me? The SAN vendors I've worked with, and I won't name names, just that their names rhyme with EMC and 3PAR, price 146GB FC drives at or over $1,000. And we're talking the exact same Hitachi drives you can buy for $500 anywhere else. But they're "certified" for the SAN and you have to buy them from the SAN vendor or there goes your warranty. It's absolutely criminal, pure highway robbery.

  20. It's all a bit moot... by Anonymous Coward · · Score: 0

    This is a pretty simplistic view. As a senior storage engineer, I have conversations like this quite often. RAID controller hardware, at the enterprise storage level, are not articles of hardware that figure in things. In addition, and perhaps more pertinently, there is a reasonable chance that in the next few years the RAID paradigm may pass into history and that disk interface models that incorporate linear power/throughput growth in enterprise storage subsystems such as IBM's XIV will take over. It's certainly a quantum improvement in thinking, at least. It will also deal with all of these smug statistical analyses that talk about RAID rebuild times growing (in line with spindle size growth) such that second disk failures prior to the rebuild of an original disk failure taking out an entire array.

    1. Re:It's all a bit moot... by dstar · · Score: 1

      It will also deal with all of these smug statistical analyses that talk about RAID rebuild times growing (in line with spindle size growth) such that second disk failures prior to the rebuild of an original disk failure taking out an entire array.

      If you aren't using RAID6, I will point my finger and laugh when this happens to you. :)

    2. Re:It's all a bit moot... by Anonymous Coward · · Score: 0

      RAID6 is all well and good but it's more designed for SATA/FATA spindles. Using it for FC spindles is a bit of a waste of time if you have confidence in your environmentals like aircon.
      XIV does away with RAID completely. The whole RAID-rebuild-time argument is not even relevant in that context.

    3. Re:It's all a bit moot... by afidel · · Score: 1

      EVA does much the same thing as XIV for rebuilds while allowing you to use 15k spindles.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  21. data center by Anonymous Coward · · Score: 0

    currently network bw is the limit in the data centers. i dont see this changing anytime soon.

  22. Re:Wait. You mean my SAN is Dead? by dbIII · · Score: 1

    Not dead just ready to be overtaken by something else. The bizzare idea of a redundant array of fileservers with parallel NFS is already done by Panasas - but I'm too scared to get a price in case I have a heart attack.

  23. Let's try that again by dbIII · · Score: 1

    There's also RAID6, but it just gives you a bit more leeway in the number of drives you can LOSE.

  24. Not that random by phorm · · Score: 1

    Some pages,however, will be rather consistent.

    Common CSS files, a site's front page,etc. For scripted pages (php/perl/asp/whatever, as well as javascript) there will be a whack of commonly included modules inside the app, please all modules or stuff that might be part or PHP/etc and pulled on an as-needed basis.

    Having worked in a company with some fairly high-traffic sites (maybe not as big as the giants, but big enough), caching of those makes a HUGE difference in performance.

    I don't disagree than SSD's will make a big difference in all that other,random, IO, but there's plenty of consistent things that can be dealt with (see: cached in memory) that a lot of people simply overlook.

  25. Definitely a problem by phorm · · Score: 1

    I've been hit by this before. New app rolls out, servers take a dive. Having some knowledge of DB's myself, I hit it with MTOP and find HUGE query tables,generally caused by extremely poorly written blocks of queries/code (doing things in code that should be done in the query, or vise-versa) and shit-poor indexing or query structure.

    Now I'm no DB expert, but when I add a few indexes/changes and suddenly that 45s query is going down to less than 1... then yeah poor, sloppy, or just lazy coding becomes a much bigger issue than lack of hardware or a poorly configured/performing server. Unfortunately there's often a big divide between the IT admins and the programmers, so collaboration in this regard gets lost as you get cowboys on both sides.

  26. There'll always be a bottleneck (sorry, England) by FrozenGeek · · Score: 1

    Unless someone designs an entire system, top to bottom, there will always be a slowest piece (aka a bottleneck). All this means is that RAID controllers will be the bottleneck until someone designs a better RAID controller. Then the bottleneck will be some other part of the system. Hard to see what the fuss is about.

    --
    linquendum tondere
  27. How do you guarantee consistency? by Pinky's+Brain · · Score: 1

    The main speed up provided by hardware raid is reliable deep write buffering ... I don't see how parallel file systems will make that advantage go away.

  28. Re:STOP USING SSDS AS HDS by mysidia · · Score: 1

    Caching, and efficient wear-levelling algorithms incorporated into the drives are designed to prevent exactly that.

  29. Dubious claims by Anonymous Coward · · Score: 0

    He may be on to something, but not in the form in which TFA is now.
    pNFS: a file of size A striped over 3 servers becomes A/3 (smaller) which actually increases randomness.
    Many vendors (e.g. NetApp, Sun and the whole bunch that's been focusing on large sequential I/O for many years (DDN, IBM)) already have RAID controllers that do a good job with non-random I/O.

  30. Re:Wait. You mean my SAN is Dead? by blackjackshellac · · Score: 1

    True, but in the context of the article it would be more something like,

    RAID_CONTROLLER -> SATA -> SSD -> RAM

    I suspect that the hardware raid controller can easily be replaced by the network,

    [Network/GIGE/10GE/etc] -> SATA -> [SSD -> RAM]

    The way things stand right now there's no real benefit that I can see from sharing
    SSD across the network, even though the network is certainly fast enough to compete
    with latencies on the SSD.

    Network shared block devices or more probably "object stores" are an interesting
    option, especially for read only or read mostly applications like web provision.

    --
    Salut,

    Jacques

  31. Re:Wait. You mean my SAN is Dead? by jon3k · · Score: 1

    "[Network/GIGE/10GE/etc] -> SATA -> [SSD -> RAM]"

    Exactly, in the context of SANs, you typically don't have a RAID controllers.

    I think you underestimate performance requirements if you don't see a need for SSDs. Typical SANs operate in microsecond latencies across the cable plant, whereas mechanical disks have seek latencies in the milliseconds. Also SSDs throughput is already twice that of mecahnical disks and (read) IOPS aren't even comparable, SSDs are an order of magnitude faster. And just wait until we see fiber channel SSDs or SATA 6G SSDs that will be doing over 500MB/s right out of the gate.

  32. Desktop and Server RAID lacking by Anonymous Coward · · Score: 0

    The problem isn't a matter of IOBs, it's bandwidth. A controller RAID doesn't care if you're using 512-byte blocks or 4k. What matters is the request rate, size of requests, and striping size. A 64kB read is a 64kB read, whether it's a 128 block request to the drive, or a 16 block. The larger block size is easier for some caching algorithms because there are fewer blocks to manage. Unless the databases change how they work, they're still going to be making a ton of 4k requests.

    There are a couple things holding back RAID performance in the low end hardware. First, the typical card is 8-lane PCIe or less, which means a limit of 2, 4, or 8 GB/s, depending of which generation of PCIe is used. That can be eaten by 10-40 SSDs. With R1, that doubles the drive I/Os, a R5 RMW quadruples it, and R6 sextuples it. An 8-lane PCIe 3 means a potential back end of 48GB/s. SAS 3 is 6Gb/s, so 600MB/s per channel means 80 ports are needed, but a typical backplace controller only has 8. This carries over to whether it's a hardware RAID controller or host-based RAID/file system using a SAS controller. Adding to the hardware controller's problems, the memory for RAID5/6 or read/write cache can't support that backend speed. PC3-12800 tops out at less than 13GB/s. Finally, the processors on the RAID controllers are embedded class, so there's one or 2 MIPS, PPC, or ARM CPUs. They range in speed from a couple hundred MHz to 1 GHz, which is pretty anemic for managing all those requests. Going to a faster processor means a lot more heat manage on a daughter card.