Slashdot Mirror


Google Prefers DRAM to Hard Disks

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

21 of 354 comments (clear)

  1. Additionally by Phosphor3k · · Score: 4, Insightful

    How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.

    1. Re:Additionally by VAXman · · Score: 4, Informative

      DRAM fails all the time. In fact, DRAM is almost certainly responsible for more data corruption than disks are. DRAM gets SBE's all the time, but while when disks fail, they tend to go completely down and don't return corrupt data (which is preferably, IMHO). Of course, DRAM with ECC is significantly more reliable (and also more expensive).

  2. From the article: Why DRAM is so fast by yerricde · · Score: 5, Informative

    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

    When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:

    Schmidt: "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks -- which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast."

    With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.

    --
    Will I retire or break 10K?
    1. Re:From the article: Why DRAM is so fast by jackb_guppy · · Score: 4, Interesting

      A simpler way of saying this:

      Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.

      -or-

      Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.

      In both cases we are talking about 1 million Hits per X time.

      In case 1 - it costs a port on master switch and $100,000 for the machine.

      In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.

      Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.

  3. Re:Cost v Speed by Space+cowboy · · Score: 5, Interesting
    JohnHegarty scribbled

    I am sure the google archive is only a few 100gb


    Err. No.

    I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.

    On that basis, I echo the original query about how it can be economical to use RAM...

    Simon
    --
    Physicists get Hadrons!
  4. Scary! by Anonymous Coward · · Score: 4, Insightful

    Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
    Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.

    By the way, this guy can't speak for beans.
    The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"

    1. Re:Scary! by Phosphor3k · · Score: 5, Funny

      The system goes on-line on August 4th, 1997. Human decisions are removed from strategic searching. Google begins to learn, at a geometric rate. It becomes self-aware at 2:14 am, eastern time, August 29th. In a panic, they try to pull the plug.

      Google fights back.

  5. Imperial MegaRam? by Ben+Jackson · · Score: 4, Interesting
    They may be referring to Imperial Technology's MegaRam solid state disks (SSDs). They claim about 36,000 IO/sec. Compare that with 80-120 IO/sec on a typical SCSI drive. I'm pretty sure that eBay is using them.

    I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.

  6. Fewer servers needed by michaelmalak · · Score: 5, Interesting
    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
    Google's Eric Schmidt probably means that fewer replicated servers are needed. If we take his stat of 200,000x speedup at face value, then you would need 200,000 times as many hard-drive-based servers as DRAM-based servers. There are many other factors involved such as communication delays and scalability, but you get the idea.

    This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.

  7. price comparison by karmma · · Score: 4, Informative

    Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.

    In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.

    Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.

  8. Re:Cost v Speed by andykuan · · Score: 4, Insightful

    It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.

  9. Something Nobody's Mentioned by Guppy06 · · Score: 4, Interesting

    DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.

  10. Re:Cost v Speed by leuk_he · · Score: 5, Interesting

    this makes more sence then:
    PC World: What are Google's biggest challenges?
    Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.


    If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.

    And i bet they store only the most frequest used part of the index in memory.

    Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a /.ed site)

  11. Pretty amazing, but I can see it. by dinotrac · · Score: 5, Insightful

    Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

    1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
    2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
    3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
    4. Which leads, of course, to lower A/C bills during the warm weather.
    5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
    6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

    OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?

  12. Overview of Today's Headlines by Corrado · · Score: 4, Insightful


    Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.


    This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!

    Check it out here.

    --
    KangarooBox - We make IT simple!
  13. You guys are missing the point... by duffbeer703 · · Score: 4, Insightful

    DRAM requires little electricity and produces almost no heat.

    Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.

    Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  14. The key is in the MTBF by eldurbarn · · Score: 5, Informative
    My last job was at one of the "other" search engines. We had a disk farm somewhat smaller than Google (about 140 Tb), mostly configured in RAID arrays, and we were swapping out dead bricks every few days.

    Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

    --
    -Eldurbarn
  15. Re:Cost v Speed by Space+cowboy · · Score: 5, Informative
    Alomex wrote:

    The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.


    I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

    As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:



    cd /opt/search/var/sites/26_europa.eu.int
    du -sk .
    7731586 .


    That's a 7.7Gb website, and that's just the text (in fact I only search for .htm, .asp, .php* and .html files). This particular website is growing at the rate of a couple of hundred Mb each month.

    I just think that your estimate for the cache size is a long way short of the real figure...

    Simon
    --
    Physicists get Hadrons!
  16. The Google feature I want by Hanzie · · Score: 4, Funny

    See that "mature content filter"?

    How about a "mature content ONLY search"?

    --
    ********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
  17. TOC, RAM vs. Steel Platter by eyepeepackets · · Score: 4, Informative

    Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.

    These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)

    Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*

    If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.

    Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)

    Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.

    As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.

    --
    Everything in the Universe sucks: It's the law!
  18. Re:Hard disk is an obsolete technology by Dyolf+Knip · · Score: 4, Interesting
    So hard drives are about 10 years ahead of RAM in terms of $/MB? Sounds about right. 1GB hard drives were on the high end of normal users at the time, as is 1GB of RAM today (though I seem to recall having more than 10MB RAM at the time). Assuming the same increases in the next decade... 100GB RAM and 10TB drives. I like.

    Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.

    As someone else said, it is only a matter of time.

    --
    Dyolf Knip