Google Prefers DRAM to Hard Disks
KP writes: "I came across this interview with Google's CEO. A very interesting
read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is
cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"
Err. No.
I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.
On that basis, I echo the original query about how it can be economical to use RAM...
Simon
Physicists get Hadrons!
I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.
If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.
You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.
I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.
Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!
-- Never make a general statement.
Imagine to keep search queries at an acceptable level, you may need 4 boxen with hard disks to perform as fast as 1 box with a wedge of RAM. So the single cost of RAM makes 3 boxen no longer needed.
A simpler way of saying this:
Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.
-or-
Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.
In both cases we are talking about 1 million Hits per X time.
In case 1 - it costs a port on master switch and $100,000 for the machine.
In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.
Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.
less mirrors = less computers = less space
real estate is expensive.
The masses are the crack whores of religion.
DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.
actually google uses freebsd on their PCs
this makes more sence then:
/.ed site)
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.
If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.
And i bet they store only the most frequest used part of the index in memory.
Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a
Huh? Go to handhelds.org and look at the specs for the various linux handhelds. Few if any of them have hard disks; everything is run out of memory. This doesn't seem to have been much of a problem with linux (or any of the unix clones). A "ramdisk" isn't exactly a new concept in the unix environment.
/tmp directory (and symlink /usr/tmp to /tmp, or vice-versa). This causes most apps' temp files to be in main memory, and eliminates rotational delays for these files.
In fact, this sort of trick was exactly why the unix "block device" abstraction was invented more than a quarter century ago. It allows you to have a file system on anything that can store data in addressable chunks called "blocks". Memory works just fine for this.
An old trick for speeding up unix systems has been to use memory for the
There's no real problem with mapping the entire file system to memory.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Hmmm... I can top that.
Just a thought:
when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).
It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.
The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.
The company I work for makes computers with a lot of RAM and so we've been researching how to survive a RAM chip failure, but as far as I know no system implements such a technology.
Google doesn't cache images google doesn't index or cache dynamic (scripted) content google caches PDFs as Plaintext.
However they are definitely on the scale of terrabytes. "Searched the web for a.
Results 1 - 10 of about 1,470,000,000. Search took 0.31 seconds." Assuming an average of ~25k cached per link 1.4 billion links would leave a cache of about 37,632,000,000,000 bytes, However The Cache doesn't necisarily need to be stored on RAMDISKs. He clearly states that it's 200,000 times more efficient for _seekable_ data. This means not the 'cached' data but rather the stuff that the search alagorythm looks at to show you appropriate hits. So the heart of the 'search' engine is using RAM exclusively, but 'cached' data would almost certainly still be stored on HDs, unless of course someone has built google a bunch of 120GB DRAM disks that use conventional HD interfaces (sorta like the Flash memory Drives, only on steroids when it comes to speed).
It could even be misleading Google could have meant flash memory HDs were cheaper but mistakenly refered to them as DRAM.
https://www.gnu.org/philosophy/free-sw.html
IBM sells this technology. They call it ChipKill.
Perhaps this is what your company is looking for:
ChipKill
Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.
As someone else said, it is only a matter of time.
Dyolf Knip