Google Prefers DRAM to Hard Disks
KP writes: "I came across this interview with Google's CEO. A very interesting
read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is
cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"
Err. No.
I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.
On that basis, I echo the original query about how it can be economical to use RAM...
Simon
Physicists get Hadrons!
I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.
If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.
You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.
I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.
Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!
-- Never make a general statement.
A simpler way of saying this:
Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.
-or-
Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.
In both cases we are talking about 1 million Hits per X time.
In case 1 - it costs a port on master switch and $100,000 for the machine.
In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.
Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.
DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.
actually google uses freebsd on their PCs
this makes more sence then:
/.ed site)
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.
If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.
And i bet they store only the most frequest used part of the index in memory.
Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a
Hmmm... I can top that.
Just a thought:
when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).
It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.
The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.
IBM sells this technology. They call it ChipKill.
Perhaps this is what your company is looking for:
ChipKill
Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.
As someone else said, it is only a matter of time.
Dyolf Knip