AOL Spends $1M On Solid State Memory SAN
Lucas123 writes "AOL recently completed the roll out of a 50TB SAN made entirely of NAND flash in order to address performance issues with its relational database. While the flash memory fixed the problem, it didn't come cheap, at about four times the cost of a typical Fibre Channel disk array with the same capacity, and it performs at about 250,000 IOPS. One reason the flash SAN is so fast is that it doesn't use a SAS or PCIe backbone, but instead has a proprietary interface that offers up 5 to 6Gb/s throughput. AOL's senior operations architect said the SAN cost about $20 per gigabyte of capacity, or about $1 million. But, as he puts it, 'It's very easy to fall in love with this stuff once you're on it.'"
What is surprising to me is not the amount of money spent on what was bought, but the fact that AOL has any performance issues at all. They still have users? They have an entire database of users?
You can't handle the truth.
> AOL recently completed the roll out of a 50TB SAN made entirely of NAND flash
ME TOO!!!
As a DBA, I would love to have solid-state storage instead of needing to segment my databases properly and work with the software dev guys to make sure we have reasonable load distribution.
Where can I get someone to pay a million dollars so I can do substandard work?
---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
It does mention that sas can 'only' deliver 5Gbit/sec - but is that not the bandwidth for each disk and thus not a problem at all?
The reason the ssh is so much faster is most likely the nice search time for ssd. And I really like the concept of them using flash chips directly. Now we just need something cheeper then 20$/GB :}
You can read more about that here:
http://www.google.com/search?q=High-Speed+Data+Link
AOL have a website?
I wonder what machines that the LUNs are presented to. I'm guessing either extreme end x86 hardware, SPARC, or POWER. Most machines out there would not even notice the performance increase.
My impression has been that this has been what has been going on for some time now with all the larger database operations, and one of the reasons why SSD have not yet come down in price is that all the best units and tech are going to the big companies as fast as they can get it from the manufacturers. I wouldn't be surprised to see someone like Google saying something like "yawn, 50TB" and saying that they have PETABYTE versions already out there.
If you run a Database of any size, especially ones with large read to write ratios, SSD would only make things faster. And speed counts.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
"but instead has a proprietary interface that offers up 5 to 6Gb/s throughput."
You know that SAS offers 6Gb/s throughput and Infiniband up to 300Gb/s (with 8 and 16 being more common).
Either way, $1M for a bunch of SAS SSD (even SAS NVRAM) is way overpriced imho. They could've done it cheaper.
Custom electronics and digital signage for your business: www.evcircuits.com
Just curious, have they exhausted all of their software avenues for this? While yes, I understand they have a huge relational DB, I know other companies that are just as big/bigger and the have next to no issues. Maybe its just poorly designed? That's a hell of a lot of (albiet super sexy) hardware to throw at what could be a software problem. Thoughts?
once you figure the total energy savings (reduced power needs, reduced cooling needs, etc) over the lifetime of the drive I wonder how much more expensive it is. I can't wait for SSD to become more affordable. I'd like to have that in our SANs too.
Remember when AOL used to send you so many floppies in the mail, you didn't need to go out and buy them yourself?
I'm looking forward to getting 50 TB SANs in the mail.
Does this mean AOL is doing something novel and progressive? Something doesn't feel right about that...
I'm so confused!
I'll meet you at the intersection of "Should be" and "Reality"
I wonder what the read/write rating is vs. a hard disk?
Wikipedia puts flash at 1,000,000 program-erase cycles
What one fool can do, another can. (Ancient Simian Proverb)
Although I'm certain the person designing the SAN had a blast doing so and did an excellent job, it still seems it would have been faster/easier to go with a pre-existing SAN/DB system such as Oracle's exadata2
I've personally witnessed the exadata2 process close to the advertised 1,000,000 iops(well it was in a controlled demo environment done by oracle, but still, it was impressive).
I'd also be curious in how much the second SAN would cost. If the first one costs $1, will the second one be cheaper and thus justifying developing the system in house?
Not the brightest people in the world there at AOL (what do you expect?). I can't wait to see what their failure rate is after a year or so of usage.
Unless they have improved recently my experience with SSD/Flash drives is that they fail quite often. I have never had one last more than a year with relatively heavy use (developer workstations and database usage).
I would put good money on them losing at least 50% of the whole array over a one year period (I actually think the odds are pretty good that they lose 100% but I'll leave some room since SSD's could have improved since I tried them about a year ago).
It's very easy to fall in love with this stuff once you're on it.
I said the same thing about coke in the 70's....
I guess what i'm saying is, no one loan money to AOL until they admit they have a problem.
From summary:
What are they talking about? The violin memory website says the appliances themselves support FC, 10 GbE, and Infiniband connections. Their performance page says that the appliance can be directly connected to a pcie bus, presumably using some sort of pass-through interface card, but what physical connector and media are used?
It's very easy to fall in love with this girl once you're on her.
There, fixed it for you.
Although I suppose it's possible you were talking about drugs, alcohol, cigarettes, caffeine, candy, or pizza. Some people call those things 'stuff'
But surely one doesn't really fall in love with a million dollar box that will be worth $100 in 5 years.
And your computer apps will adjust to the storage capabilities of your solid-state storage and require yet even more performance at even higher capacities.
Ooops... back to mechanical disks.
6Gbs huh? Ok, so i'm assuming you have some special cable connecting to the SAN... I know offhand that dell sells the MD3200 - a DAS unit that transfers 6Gb/s... Although I estimated it was about 10GB in 30 seconds.
I've got to be missing something here. The seek times are probably out of this world with this "specialized" SAN, but then we have equallogic SANs that can have 48 SSDs and have 10Gb/s...
Hey AOL - you are in the arctic right? Can I interest you in some of this amazing ice?
Hei folks,
20$/GB is not that much IMHO... is that net capacity, does it include geographical replication? Depending on the answer, the real news could be that SSD storage is so much more competitive that one may have thought... :D
Mi domando chi à il mandante di tutte le cazzate che faccio - Altan
Um. Am I the only that thought the speculated price was a bit low?
I would be surprised if that $20/GB isn't the raw perGB cost and the 50TB is the usable figure for how much storage they ended up with.
That means there's RAID in there, probably spares, any other overhead and hmmm did I see that it's mirrored across two six-node clusters?
$1M 'tain't that much for some screaming storage and my first thought was "wow...that is really reasonable for that much solid state"
I look to Google, Facebook, and other massively scaled companies that build highly distributed systems running on low availability commodity systems. These guys are not throwing Solid State Memory at biggus relational databases. Sorry, but this is a bandaid for a dinosaur.
This is clearly an application where $/IOP is the problem, not $/GB. If they need 250K random IOPs, they'd need something in the order of 800-850 FC disk drives, and a honking big array to house them, and they certainly wouldn't see any change from $1M for that configuration from EMC, then you add the running costs in terms of power for that configuration and the FLASH stuff looks really attractive.
FTFA: They're using the NAND memory on a custom board sitting on a PCIe bus, they are getting 4GB/sec.
sas is capable of 6 Gb/s, that's why fibre channels is being phased out. aol isn't doing anything any other enterprise is doing, only difference is somebody decided to write about it.
The developers / DBA's on this project are not familiar with the 'CREATE INDEX' statement.
They wanted performance and went *RAID 5*? That pretty much sums the entire approach up. Let's not optimise the application first, the database second, but instead hide the problem by throwing hardware at it. Then what we'll do is use a RAID configuration that hobbles the write performance of the arrays and lets not mention what happens to performance when we lose a disk (don't say it won't happen).
Sure, RAID 5 is the answer to somethings, but not when the question is database *PERFORMANCE*.
Also - latency is more important than IOP/s. I don't care how many IOP/s you can do, if you're latency is high, the performance won't be. Most garden variety storage engineers don't seem to grasp this concept.
I'm enjoying the comments which are sarcastically asking whether AOL is doing anything amazing to justify this investment. A million dollars is not a big deal in terms of capital investment, even to a firm which has taken it's share of losses recently. If the choice was an extra three to four months of performance problems while the developers work out the best way to tune the db and spending a million dollars on storage you probably would have bought in some form anyway then that's no choice at all if your an operations director or whoever approves this sort of thing.
If you compare their IOPS price to a Fibre drive you will find that AOL got quite the bargain. 250,000 IOPS / 180 IOPS = 1388 10kRPM Fibre drives * $2,000 a pop - $1M = $1.7M savings.
What the hell does AOL need a database for? Users still on hold trying to cancel their accounts?
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
at about four times the cost of a typical Fibre Channel disk array with the same capacity, and it performs at about 250,000 IOPS. One reason the flash SAN is so fast is that it doesn't use a SAS or PCIe backbone, but instead has a proprietary interface that offers up 5 to 6Gb/s throughput. AOL's senior operations architect said the SAN cost about $20 per gigabyte of capacity, or about $1 million.
A $250,000 Fibre Channel Disk array doesn't have a SAS or PCIe backbone either. There are plenty of valid reasons flash can be "faster" in one measure or another than disk, but... I feel dumber just for having to saying Fibre Channel Disk Array and SAS in the same sentence. Urgh... /.
It is hard to know anything for sure with this limited amount of info. But it appears to me that they have not accomplished such a great feat.
I put together a server this year that pushes over 9 GB/s. I did this with a mere 150 2.5 inch drives. (144 raid 10 + 6 live spares). This was SAS 2.0 of course, because in the real world SAS kicks FC's A**.
We found that the real bottleneck to throughput is not the drives and not the SAS cards. We have 8 SAS 2.0 lanes coming into each card, multiply that by 6 cards, and you have a heck of a lot of potential.
No, the real problem is you saturate your PCIe slots, and chipsets sometimes choke when you feed this much data. So, the chipset and PCI-e bus tend to be the restraining factor, not the archaic rotating platters.
Serial ATA 3.0 and SAS achieve 5-6 Gb/s. This system delivers 4 GB/s. It's really sad how these sloppy summaries make it to the front page.
Quote from TFA: "So you're getting the 4GB/sec. of PCIe bandwidth, not the 5Gbit/sec. or 6Gbit/sec. SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said.
They will probably save money compared to powering and cooling the equivalent disk array.
Wait a while until Write Amplification kicks in. Then they'll be screwed.
There are 2 kinds of people in this world: Those who write in decimal and those who don't
Wow, 50TB of flash is a lot of thumbdrives!