Google Prefers DRAM to Hard Disks
KP writes: "I came across this interview with Google's CEO. A very interesting
read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is
cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"
How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.
I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:
With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.
Will I retire or break 10K?
Err. No.
I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.
On that basis, I echo the original query about how it can be economical to use RAM...
Simon
Physicists get Hadrons!
Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.
By the way, this guy can't speak for beans.
The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"
I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.
Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.
In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.
Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.
It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.
DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.
this makes more sence then:
/.ed site)
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.
If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.
And i bet they store only the most frequest used part of the index in memory.
Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a
Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:
1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
4. Which leads, of course, to lower A/C bills during the warm weather.
5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.
OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?
Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!
Check it out here.
KangarooBox - We make IT simple!
DRAM requires little electricity and produces almost no heat.
Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.
Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.
Conformity is the jailer of freedom and enemy of growth. -JFK
Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.
-Eldurbarn
I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.
As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:
That's a 7.7Gb website, and that's just the text (in fact I only search for
I just think that your estimate for the cache size is a long way short of the real figure...
Simon
Physicists get Hadrons!
See that "mature content filter"?
How about a "mature content ONLY search"?
********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.
These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)
Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*
If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.
Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)
Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.
As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.
Everything in the Universe sucks: It's the law!
Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.
As someone else said, it is only a matter of time.
Dyolf Knip