Google Prefers DRAM to Hard Disks
KP writes: "I came across this interview with Google's CEO. A very interesting
read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is
cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"
I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:
With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.
Will I retire or break 10K?
Err. No.
I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.
On that basis, I echo the original query about how it can be economical to use RAM...
Simon
Physicists get Hadrons!
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.
The system goes on-line on August 4th, 1997. Human decisions are removed from strategic searching. Google begins to learn, at a geometric rate. It becomes self-aware at 2:14 am, eastern time, August 29th. In a panic, they try to pull the plug.
Google fights back.
this makes more sence then:
/.ed site)
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.
If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.
And i bet they store only the most frequest used part of the index in memory.
Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a
Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:
1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
4. Which leads, of course, to lower A/C bills during the warm weather.
5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.
OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?
Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.
-Eldurbarn
I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.
As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:
That's a 7.7Gb website, and that's just the text (in fact I only search for
I just think that your estimate for the cache size is a long way short of the real figure...
Simon
Physicists get Hadrons!