Google Prefers DRAM to Hard Disks

← Back to Stories (view on slashdot.org)

Google Prefers DRAM to Hard Disks

Posted by ryuzaki0 on Sunday February 3, 2002 @02:19AM from the speed-versus-spin dept.

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

8 of 354 comments (clear)

From the article: Why DRAM is so fast by yerricde · 2002-02-03 02:30 · Score: 5, Informative

I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:

Schmidt: "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks -- which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast."

With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.

--
Will I retire or break 10K?
Re:Cost v Speed by Space+cowboy · 2002-02-03 02:31 · Score: 5, Interesting

JohnHegarty scribbled

I am sure the google archive is only a few 100gb

Err. No.

I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.

On that basis, I echo the original query about how it can be economical to use RAM...

Simon

--
Physicists get Hadrons!
Fewer servers needed by michaelmalak · 2002-02-03 02:39 · Score: 5, Interesting

I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
Google's Eric Schmidt probably means that fewer replicated servers are needed. If we take his stat of 200,000x speedup at face value, then you would need 200,000 times as many hard-drive-based servers as DRAM-based servers. There are many other factors involved such as communication delays and scalability, but you get the idea.
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.
Re:Scary! by Phosphor3k · 2002-02-03 02:44 · Score: 5, Funny

The system goes on-line on August 4th, 1997. Human decisions are removed from strategic searching. Google begins to learn, at a geometric rate. It becomes self-aware at 2:14 am, eastern time, August 29th. In a panic, they try to pull the plug.

Google fights back.
Re:Cost v Speed by leuk_he · 2002-02-03 03:09 · Score: 5, Interesting

this makes more sence then:
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.

If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.

And i bet they store only the most frequest used part of the index in memory.

Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a /.ed site)
Pretty amazing, but I can see it. by dinotrac · 2002-02-03 03:16 · Score: 5, Insightful

Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
4. Which leads, of course, to lower A/C bills during the warm weather.
5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?
The key is in the MTBF by eldurbarn · 2002-02-03 03:42 · Score: 5, Informative

My last job was at one of the "other" search engines. We had a disk farm somewhat smaller than Google (about 140 Tb), mostly configured in RAID arrays, and we were swapping out dead bricks every few days.
Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

--
-Eldurbarn
Re:Cost v Speed by Space+cowboy · 2002-02-03 03:52 · Score: 5, Informative

Alomex wrote:

The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.

I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:

cd /opt/search/var/sites/26_europa.eu.int du -sk . 7731586 .

That's a 7.7Gb website, and that's just the text (in fact I only search for .htm, .asp, .php* and .html files). This particular website is growing at the rate of a couple of hundred Mb each month.

I just think that your estimate for the cache size is a long way short of the real figure...

Simon

--
Physicists get Hadrons!