eBay Deploys 100TB of SSDs, Cuts Rackspace By Half
Lucas123 writes "eBay's QA division was facing mounting performance issues related to its exponential growth of virtual servers, so instead of purchasing more 15k rpm Fibre Channel drives, the company began migrating over to a pure SSD environment. eBay said half of its 4,000 VMs are now attached to SSDs. The changeout has improved the time it takes the online site to deploy a VM from 45 minutes to 5 minutes and had a tremendous impact on its rack space requirements. 'One rack [of SSD storage] is equal to eight or nine racks of something else,' said Michael Craft, eBay's manager of QA Systems Administration."
For sites like ebay i have no doubt this makes sense. For the average small business I suspect they are far less IO bound and need storage...
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
I'll be honest - I didn't really RTFA that closely, in part because it just fawns over the SSDs.
Can someone tell me why this is significant? (Because it's EBay, because it's the first large-scale deployment of SSDs like this, etc, etc)?
Thanks in advance (and sorry about the clueless SSD noob posting :) )
Eve Online went this route too for their blades.
They're expensive, but nowhere near as expensive as SSD. I guess if performance is that important, it makes sense but how many Ebay/Google/Amazon situations are there out there?
Of course everyone would love to replace all of their storage with SSD if price was no object.
The closest they come to mentioning cost is:
Though SSD is typically a magnitude of order more expensive than hard disk drive storage, Craft said the Nimbus arrays were "on par" with the cost of his previous storage, which a Nimbus spokesman said was from NetApp and HP 3PAR. (Craft declined to identify the vendors).
So, cost of new SSDs was similar to whatever HDDs they bought years ago? Yeah, that's kind of how it goes...
Nimbus prices its product on a per-terabyte basis - it charges $10,000 per usable terabyte
$10,000 per terabyte. Ok, then. Sure, it's faster, if you are willing and able to pay 10x the cost of *current* HDD-based systems...
So the entire eBay VM operation could fit into 6 racks? 200 physical servers @ 1RU each = 5 racks 10x 10TB 2U SSDs = half a rack 5x 2U switches = quarter rack
TFA reads like a thinly-veiled promo for Nimbus Data Systems, which I can only guess are pushing a Linux-based SAN appliance full of SSDs. Big whoop.
What I would love to know is: Why does eBay need 4000 VMs ?
-Billco, Fnarg.com
The article does not mention which kind of SSDs they've used, or have I missed something? That might be very interesting, especially when it comes to reliability. It's often claimed the SSDs are more reliable than traditional drives, but accrding to this http://www.tomshardware.com/reviews/ssd-reliability-failure-rate,2923.html that's not really true.
I got the impression ebay just terminated a hosting arrangement with Rackspace (the company) -- bringing it inhouse, and cutting Rackspace's revenues in half. :)
has improved the time it takes to deploy a VM from 45 minutes to 5 minutes
uh, any logical explanation for this? SSDs are snappier and the peak I/O can be faster compared to spindle drives - but not by factor 9, or?
The cut Rackspace by half - Did they like... auction off half of that company?
I read a blog-post a while back stating that SSDs fail a lot more then you would expect. Somewhere around a year of heavy use seems to take most of the life out of a consumer grade ssd. Now i wonder how putting SSDs into Raid 5 (or 6, or whatever) will behave. If a certain model of SSD croaks around X write ops, then i think the nature of Raid will mean that your entire array of SSDs will go bad pretty closely together. It must suck to have two more drives go belly up while rebuilding your array after the first drive failure.
Perhaps it would make sense to stagger SSDs in different phases of their lifetime to keep simultanious failures at bay, use some burned in drives and some fresh ones.
People, what a bunch of bastards
Is he talking about performance or price. I can imagine that a single rack of enterprise SSD's could easily cost the same as 9 racks of anything else.
Calling someone a "hater" only means you can not rationally rebut their argument.
so a drive with 1tb space is equal to 9 tb of anything else
BULLSH%T
While most people instantly gravitate towards the upfront cost and performance of going solid state, I would make one important point. Reducing your data center space by 9 racks is significant in terms of power, cooling and that is all on top of the purchase price and support contracts. Regardless if ebay owns their own data center or colocates, the cost per square foot in a data center and the continued operation of such a large storage system is more then likely to provide a higher return on investment. eBay isn't in the business of looking cool and hip, they're in the business of selling stuff as cheaply as possible and I'm certain their CIO cares only about the bottom line.
Might even pay for itself by the years end
Barring any major catastrophes - expect to see may companies with server farms to go this route soon
..........FULL STOP.
Finally we are getting a chance of seeing real reliability stats of the SSD!
That if eBay would be kind enough to publish the data couple of years later.
All hope abandon ye who enter here.
i was under the (misguided?) impression that ssd's weren't, as yet, enterprise ready in terms of reliability?
You discover modern hardware does virtualization real well. You get a good host software, like vSphere or something on new hardware and you have extremely near native speeds. The CPUs handle almost everything just like it was running as the host OS, and sharing the CPU resources works great. Memory is likewise real good, in fact VMs only use what they need at the time so they can have a higher memory limit collectively than the system RAM and share, so long as they don't all try to use it all at once.
You really do have a situation where you can divide down a system pretty evenly and lose nothing. Let's say you had an app that used 2 cores to the max all the time and 3GB of RAM. You'd find that it would run more or less just as well on VM server with 4 cores and 8GB of RAM, half assigned to each of two VMs, as it would on two 2 core 4GB RAM boxes. ...Right up until you talk storage, then everything falls down. You have two VMs heavily access one regular magnetic drive at the same time and performance doesn't drop in half, it goes way below that. The drive is not good at random access and that is what it gets with two VMs hitting it at the same time, even if their individual operations are rather sequential.
It is a bottleneck that can really keep things from scaling like the other hardware can handle.
At work I use VMs to maintain images for instructional labs (since they all have different, custom requirements). When I'm doing install work on multiple labs, I always do it sequentially. I have plenty of CPU, a hyper-threaded 4 core i7, plenty of RAM, 8GB, there's no reason I can't load up multiples. However since they all live on the same magnetic disk, it is slower to do the installs in parallel than sequential.
If I had an SSD, I'd fire up probably 3-4 at once and have them all do installs at the same time, as it would be faster.
It might also might indicate that ebay has done mistuning of their environment.
Some idiots still think that for a web server you need to take the amount of total memory and divide it by the amount of ram one apache process takes. And then setting this value as the amount of apache tasks. Thus leaving too little 'empty' for filesystem caching. Sadly enough that mistuning advise was for years on the apache webserver faq.
Having enough memory available for filesystem caching is the number 1 priority to reduce IO operations. Only with large database servers which have completely random access, this might not help. And with web applications such as ebay it's rarely ever completely random.
But hey, it's ebay, and that they have a crappy infrastructure has been visible for years... running slowaris for a decade for SSL websites and such... absurd!
Any decent infrastructure should serve most of it's pages without any IO operation taking place at all. Just like Facebook and Google do. Disk is merely intended for persistant storage, but most gets served out of filesystem cache. But hey, the fact that they rely on 'virtual machines' already says enough about their level of intelligence...
SSD is however great for IOPs, as others indicate; especially oracle tends to do a lot of unpredicatble random IO which really gets a boots with SSDs... practical for small environments...
But I wonder what the ebay investment cost was, and what the performance increase would have been if they had invested that into RAM instead (without re-mistuning).
Greetings,
Jasper
Took me a while to figure how much do they actually cut from the end price, it's not something they're pleased to tell you.
Especially with recent "adjustment of prices" at ebay, no wonder they have extra cache to waste on whatever idiotic idea comes to their IT management.
They get 8%+ of most below 500$/Euro items sold. Outrageous.
I hope my hosting company will change to such SSDs, and I will not have to wake up those rotating disks early in the morning.
Have you seen the mess that is Ebay's HTML and CSS? They could reduce the size of the average page by 50% probably, if they hired somebody who actually knew how to code properly.
All in all this is barely a dent in anything Ebay does. It sounds more like an experiment and hype of the drives they used.
When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
even the most touted and expensive 'enterprise ssd' can die out on you unexpectedly.
Read radical news here
SSD life is limited with number of write operations. you cant use them like normal disks in the business ebay is using.
but read operations are unlimited. so, if you are going to just read files from a hard disk, ssd makes the perfect candidate. in random reads, they are approx 40 times faster than best hdd at the minimum.
so, you just put 250 ssd disks, put your VM images on it, and, as the article says, it boosts your vm deployment time to other systems from 45 minutes to 5 minutes - there is nothing wrong with the article.
Read radical news here
not only random reads.
you have to use ssds for ALL reads, but, hdds for all writes. since ssd life is still limited with number of write operations conducted.
Read radical news here
I want the old hard drives! Please?
Hmmm did eBay buy their SSDs on eBay???
Fibre Channel Storage equipment floods EBay!
'nuff said.
I'm not a lawyer, but I play one on the Internet. Blog
They sweet-spot in enterprise storage is doing deduplication with SSD with both direct attached and networked storage, plus 15k 2.5" and 7.2K 3.5" disk for the rest. Deduplication saves a lot space and with SSD it works like cache, especially in scaled out environments.
From the paper "Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?" by Bianca Schroeder Garth A. Gibson
In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks.
So the annual failure rates are apparently similar, regardless of vendor claims
Am I the only one who read the title as ebay deploys 100TB of STDs, cuts rackspace in half?
Yeah, and their bank balance. And I thought our £2 million SSD database cluster was expensive..
I wrote my first program at the age of six, and I still can't work out how this website works.
Yeh, if you compare enterprise MLC SSD and enterprise 15k rpm HDDs, the cost per gigabyte is pretty close.
480 GB enterprise SSD can be had for $1300
vs
146 GB enterprise SAS 15k rpm 2.5" at $400, 3 of these cost you $1200 to get you 438gb of storage
Of course you would probably get multiple SSDs and raid them for redundancy, but that is just an example.
Then if you factor in BOTH power and cooling savings you will see the SSD gets even closer or better for cost. There's a whole host of other savings involved in managing less disks, less space, less strain on AC, and AC failures are less of an emergency because you have so much less heat being generated.
One thing I haven't seen mentioned for the power savings is that in an extremely high random read environment where SSDs shine the most, you're generally looking at one SSD replacing multiple hard drives.
If you can get rid of 5 15k drives and replace it with 1 SSD, the capital cost difference is vastly reduced, if not eliminated, even before you look at power savings.
I don't read AC A human right