Slashdot Mirror


Google Prefers DRAM to Hard Disks

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

17 of 354 comments (clear)

  1. Re:Cost v Speed by Space+cowboy · · Score: 5, Interesting
    JohnHegarty scribbled

    I am sure the google archive is only a few 100gb


    Err. No.

    I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.

    On that basis, I echo the original query about how it can be economical to use RAM...

    Simon
    --
    Physicists get Hadrons!
  2. Imperial MegaRam? by Ben+Jackson · · Score: 4, Interesting
    They may be referring to Imperial Technology's MegaRam solid state disks (SSDs). They claim about 36,000 IO/sec. Compare that with 80-120 IO/sec on a typical SCSI drive. I'm pretty sure that eBay is using them.

    I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.

  3. Fewer servers needed by michaelmalak · · Score: 5, Interesting
    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
    Google's Eric Schmidt probably means that fewer replicated servers are needed. If we take his stat of 200,000x speedup at face value, then you would need 200,000 times as many hard-drive-based servers as DRAM-based servers. There are many other factors involved such as communication delays and scalability, but you get the idea.

    This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.

  4. RAM Disks by buckrogers · · Score: 3, Interesting

    If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.

    You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.

    I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.

    Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!

    --
    -- Never make a general statement.
  5. Re:Cost v Speed by Anonymous Coward · · Score: 1, Interesting

    Imagine to keep search queries at an acceptable level, you may need 4 boxen with hard disks to perform as fast as 1 box with a wedge of RAM. So the single cost of RAM makes 3 boxen no longer needed.

  6. Re:From the article: Why DRAM is so fast by jackb_guppy · · Score: 4, Interesting

    A simpler way of saying this:

    Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.

    -or-

    Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.

    In both cases we are talking about 1 million Hits per X time.

    In case 1 - it costs a port on master switch and $100,000 for the machine.

    In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.

    Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.

  7. another reason by oyenstikker · · Score: 1, Interesting

    less mirrors = less computers = less space
    real estate is expensive.

    --
    The masses are the crack whores of religion.
  8. Something Nobody's Mentioned by Guppy06 · · Score: 4, Interesting

    DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.

  9. Re:RAM vs. HDD by Anonymous Coward · · Score: 3, Interesting

    actually google uses freebsd on their PCs

  10. Re:Cost v Speed by leuk_he · · Score: 5, Interesting

    this makes more sence then:
    PC World: What are Google's biggest challenges?
    Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.


    If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.

    And i bet they store only the most frequest used part of the index in memory.

    Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a /.ed site)

  11. Re:I've always wondered by jc42 · · Score: 2, Interesting

    Huh? Go to handhelds.org and look at the specs for the various linux handhelds. Few if any of them have hard disks; everything is run out of memory. This doesn't seem to have been much of a problem with linux (or any of the unix clones). A "ramdisk" isn't exactly a new concept in the unix environment.

    In fact, this sort of trick was exactly why the unix "block device" abstraction was invented more than a quarter century ago. It allows you to have a file system on anything that can store data in addressable chunks called "blocks". Memory works just fine for this.

    An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa). This causes most apps' temp files to be in main memory, and eliminates rotational delays for these files.

    There's no real problem with mapping the entire file system to memory.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  12. Re:Overview of Today's Headlines by costas · · Score: 3, Interesting

    Hmmm... I can top that.

  13. Re:Cost v Speed by jovlinger · · Score: 3, Interesting

    Just a thought:

    when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).

    It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.

    The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.

  14. Re:Additionally by Blind+Lemon · · Score: 2, Interesting
    With hard disks you have things like RAID to protect against disk failure. No such thing with RAM. Sure, you can get protection from a bit going bad, but not for loosing a chip.

    The company I work for makes computers with a lot of RAM and so we've been researching how to survive a RAM chip failure, but as far as I know no system implements such a technology.

  15. Re:Cost v Speed by kesuki · · Score: 2, Interesting

    Google doesn't cache images google doesn't index or cache dynamic (scripted) content google caches PDFs as Plaintext.
    However they are definitely on the scale of terrabytes. "Searched the web for a.
    Results 1 - 10 of about 1,470,000,000. Search took 0.31 seconds." Assuming an average of ~25k cached per link 1.4 billion links would leave a cache of about 37,632,000,000,000 bytes, However The Cache doesn't necisarily need to be stored on RAMDISKs. He clearly states that it's 200,000 times more efficient for _seekable_ data. This means not the 'cached' data but rather the stuff that the search alagorythm looks at to show you appropriate hits. So the heart of the 'search' engine is using RAM exclusively, but 'cached' data would almost certainly still be stored on HDs, unless of course someone has built google a bunch of 120GB DRAM disks that use conventional HD interfaces (sorta like the Flash memory Drives, only on steroids when it comes to speed).
    It could even be misleading Google could have meant flash memory HDs were cheaper but mistakenly refered to them as DRAM.

  16. Re:Additionally by Defiler · · Score: 3, Interesting

    IBM sells this technology. They call it ChipKill.
    Perhaps this is what your company is looking for:
    ChipKill

  17. Re:Hard disk is an obsolete technology by Dyolf+Knip · · Score: 4, Interesting
    So hard drives are about 10 years ahead of RAM in terms of $/MB? Sounds about right. 1GB hard drives were on the high end of normal users at the time, as is 1GB of RAM today (though I seem to recall having more than 10MB RAM at the time). Assuming the same increases in the next decade... 100GB RAM and 10TB drives. I like.

    Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.

    As someone else said, it is only a matter of time.

    --
    Dyolf Knip