Slashdot Mirror


Google Prefers DRAM to Hard Disks

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

23 of 354 comments (clear)

  1. From the article: Why DRAM is so fast by yerricde · · Score: 5, Informative

    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

    When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:

    Schmidt: "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks -- which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast."

    With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.

    --
    Will I retire or break 10K?
  2. Once again a simplistic view by damieng · · Score: 3, Informative

    I often see comments from this from people who have little experience in business.

    What you pay for the initial product is not what it "costs" in the long-term. Businesses have a term for this called TCO or Total Cost of Ownership. It includes all the other time and materials needed to keep the item in use.

    I would imagine in this case that the simple reason is that why DRAM is more expensive to purchase it is a *lot* less expensive to run, the primary cost being power.

    Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.

    --
    [)amien
  3. Re:I've always wondered by uncl_bob · · Score: 1, Informative

    Actually, not that much of the operating system is pulled from the harddrive once the system is up. Maybe some special parts of windows like IE and other things would benefit from beeing in RAM, but not the whole C:\windows-tree.

  4. Re:I've always wondered by propstoalldeadhomiez · · Score: 1, Informative

    There's an option in Win2k to not swap portions of the kernel out. If you have 128 MB of RAM or more, it's probably a good idea, too. The whole thing doesn't need to be in memory the whole time, just what you use the most.

    --

    Jack Buck (1924-2002)
    Darryl Kile (1968-2002)
  5. I believe it... by josh+crawley · · Score: 3, Informative

    At my dad's work, they use a type of chip, but it's not dram. They use E^2prom. True, you do take a performance hit, but they have 10 "gig ethernet ports" on the thing. The last price quote I got was $12000 for a terabyte of this stuff. Don't forget to compare price/performance ratios to the best chipsets of IDE (or if you're a scsi bigot, SCSI). Pulling random data is very easy for chips, but HD's of ANY speed and quality are still slower.

    Josh Crawley

  6. Re:I've always wondered by MarkusQ · · Score: 2, Informative
    Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?

    AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.

    You can do it in Linux (and probably in Windows too, though I'm not sure how)--but there generally isn't a reason to. The VM/RD cycle swings back and forth over the years, but at present the PC world seems to be running best with 2::1 VM ratio (using a chunk of HD about twice your RAM size to simulate more RAM) although part of this is that RAM is being used up by smart caching of disk. This holds for Windows, Linux, and (IIRC) Open BSD.

    So, the short answer is: you could do it, but it would likely slow you down overall.

    -- MarkusQ

  7. Five minute rule by NearlyHeadless · · Score: 3, Informative
    The raw cost of DRAM ($/MB) is still much higher, but that is not the complete analysis. Database god Jim Gray's analysis shows that you should keep data in memory if it is going to be accessed every five minutes or less.


    See The Five-Minute Rule, ten years later (Word Doc) or it's HTML-ified Google Cache

  8. price comparison by karmma · · Score: 4, Informative

    Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.

    In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.

    Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.

  9. Re:I've always wondered by Cylix · · Score: 3, Informative

    I looked into using a virtual ram disk for a section of data that was being accessed quite frequently. Of course I did some reading and it turned out not to be terribly necessary.

    The more memory present in the system, the more memory the linux kernel dedicates to caching. Thus commonly read files are in memory and have incredibly fast reads. This is performed auto-magically without the user even being aware of it.

    Of course no two situations are exact and you may have a purpose for dedicating a ram disk to something. There are instances where you may want a fast read/response time, but the file isn't commonly used. Such as the data for a squid proxy cache. A ram disk in such a situation would be entirely helpful.

    --
    "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
  10. The key is in the MTBF by eldurbarn · · Score: 5, Informative
    My last job was at one of the "other" search engines. We had a disk farm somewhat smaller than Google (about 140 Tb), mostly configured in RAID arrays, and we were swapping out dead bricks every few days.

    Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

    --
    -Eldurbarn
  11. Re:Fewer servers needed by The+Smith · · Score: 2, Informative

    Yes, but it's all rather confusing. Read this thread in the Linux kernel mailing list if you're really interested. (WARNING: You won't understand any of it unless you know how the x86 virtual memory mechanism works.)

  12. Re:Cost v Speed by Space+cowboy · · Score: 5, Informative
    Alomex wrote:

    The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.


    I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

    As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:



    cd /opt/search/var/sites/26_europa.eu.int
    du -sk .
    7731586 .


    That's a 7.7Gb website, and that's just the text (in fact I only search for .htm, .asp, .php* and .html files). This particular website is growing at the rate of a couple of hundred Mb each month.

    I just think that your estimate for the cache size is a long way short of the real figure...

    Simon
    --
    Physicists get Hadrons!
  13. DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · · Score: 3, Informative



    Its not a fair comparrison to put 1GB worth of DRAM on one side of the scale, and 1GB worth of physical storage on the other. The hard disk will obviously come out to be the cheaper of the two. However, to a company like Google who undoubtedly uses RAID technology for storage, you're effectively not getting the same "bang for your buck" as you would with a JBOD array. In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.

    Mind you, thats not to say that RAID is a bad technology..heh, hardly. Its just that you cant make a 1 to 1 comparrison from DRAM to physical without taking into account the storage methods employed by each.

    Cheers

    --
    Bowie J. Poag

  14. Re:Additionally by VAXman · · Score: 4, Informative

    DRAM fails all the time. In fact, DRAM is almost certainly responsible for more data corruption than disks are. DRAM gets SBE's all the time, but while when disks fail, they tend to go completely down and don't return corrupt data (which is preferably, IMHO). Of course, DRAM with ECC is significantly more reliable (and also more expensive).

  15. Re:Google is great... by SpinyNorman · · Score: 3, Informative

    Um.. they do.

    AND is by default
    OR is OR
    NOT is -

    I don't think parenthesis for grouping works though (they don't mention it), so you can't do more complex queries, but you can certainly do:

    A AND (B OR C) AND !D

    Which would be: A B OR C -D

  16. Re:Additionally by Spoing · · Score: 2, Informative
    RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.

    When's the last time you checked your RAM? I get about 1 bad module for every 2 machines. Defects usually show up on the initial test, though some don't show up for a few years.

    Don't believe me? Try it yourself; Memtest86. I suggest running one full test (can take days) when you first build a machine, and when you run into odd problems that you can't figure out. The default tests are good, but I've had times where it did miss problems.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  17. TOC, RAM vs. Steel Platter by eyepeepackets · · Score: 4, Informative

    Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.

    These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)

    Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*

    If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.

    Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)

    Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.

    As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.

    --
    Everything in the Universe sucks: It's the law!
  18. Re:Cost v Speed by Yokaze · · Score: 3, Informative
    > each of which occupies how many bytes in index files?

    According to "The Anatomy of Large-Scale Hypertextual Web Search Engine" by Segey Brind and Lawrence Page, the inverted index ("inverted barrels") was about 47.2Gb large (Total data without repository 55.2Gb, Repository 53.5Gb). It had about 24 Million web pages indexed. Assuming a linear increase this amounts to about 5Tb.
    But, to quote from the paper:

    With better encoding and compression of the document index, a high quality web search engine may fit onto a 7Gb drive of a new PC.

    Which is surely slightly exaggerated, but shows that they considered that there is room for improvement. (E.g using varying length index instead of fixed width)

    >I dont think Linux can do it
    At least they think it can do it, since they are using Linux boxes, at least accoring to

    The Technology Behind Google, by Jim Reese CEO.
    More than 10,000 Linux boxes, that is.
    --
    "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
  19. Re:Additionally by Hal-9001 · · Score: 3, Informative
    RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.
    RAM is not mechanical, it's capacitive, i.e. it operates by storing charge. One of the advantages of semiconductor, or solid-state, electronics over pre-transistor electromechanical relays and vacuum tubes is that they require no moving parts, making them more rugged and reliable.
    Defects usually show up on the initial test, though some don't show up for a few years.
    A curious thing about solid-state electronics is that a large number of parts fail initially, then the failure rate is constant for several years, and then the failure rate increases again. This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days. Contrast this with mechanical failure, which continually increases with time.
    --
    "It take 9 months to bear a child, no matter how many women you assign to the job."
  20. Re:Additionally by Chmarr · · Score: 3, Informative
    Ram has both an electronic component, and mechanical. Try this experiment: Take the RAM out of your computer and throw it at your workmate/housemate/mum. He or she will say 'Ow!', and it's not because he or she was hit by electrons!

    RAM heats up as it's used, metal expands, the Chips on that little PCB stretch slightly, joints weaken with each power cycle, sometimes they fragment. The same thing with the connectors to the motherboard.

    Telstra, in Australia, was having a hellish time with certain Cisco routers as the RAM heating up would eventually work it's way out of the socket, crashing the router!

  21. Re:You guys are missing the point... by kesuki · · Score: 3, Informative

    With over 35 DRAM chips on the american market what good does it do to check only a single type of memory module from a single maker?
    However, since I don't want to spend the rest of the day finding out the lowest power DRAM module with the highest capacity, I will assume that the best case Senario is 4GB of ram using approximately the power of two HDs of any capacity after 4GB you would require either a custom DRAM NAS/HD or a second PC. However NAS Dram with multiple gigabit ethernet ports offer the most DRAM storage per watt of electricity. Still it is at least 4x as power hungry as an 8 HD 1TB Raid server. Assuming each DRAM chip in the NAS is 64 Megabytes. To reach one terrabyte we need 16 thousand Dram chips. Obviously if each chip even requires .1 watts to operate they're using 1600 watts of power. While the HD server may need a peak of 500+ watts even under load it still isn't using as much as when all 8 drives spin up so it's probably only using 400 watts total for the whole system under load.

    While it's pretty clear that power isn't an area that google can save money using DRAM over HD, and while DRAM is solid state and if it doesn't fail the first 6 months it probably wont fail in the first 100 years, it is still going to become obsolete long before it fails, requiring replacement. I've also figured that at $4 a Dram chip the cost of 1TB is $64,000 Vs $5,000 for a total package 1TB HD server. Even if you replaced the drives every 6 months it would take 15 years before the cost of materials on HDs exceeded the cost of materials on DRAM. However, there is a cost savings. First of all if you're mirroring the drives that doubles the electrical and material cost of the HD storage. Second of all that 1 GB HD server is only going to have it's seek time saturated by only 100 megabit ethernet.
    Unless the data is entirely sequential (not requiring seek time) and even in the case of sequential data a single gigabit ethernet is sufficient. That Dram 1TB has at worst 12 NS latency or .000000012 seconds per seek. That provides 83,333,333 seeks per second. The only thing he was wrong about is that DRAM isn't 200,000 times as faster as HD for data that requires seek it's on a magnatute of Millions of times more effcient. 200,000 times is probably based on real world performance differences. based on using DRAM vs HD in a "real world" setting and not just on paper. That means to replicate the Speed of DRAM with hard drives is a futile task.
    Far more futile than trying to replicate the capacity of HDs with DRAM.

  22. Re:Scary! by Mr+Z · · Score: 2, Informative

    And mine, too. Actually, in case you didn't recognize it, the original poster's scenario comes directly from the Terminator series. Skynet became sentient on August 29th, 1997. (Which was, incidently, my 22nd birthday.)

    --Joe
  23. Re:Additionally by alex_ant · · Score: 2, Informative

    I agree that DRAM is certainly more reliable than hard disk storage, but I should point out that a computer's power-up "memory test" is more like a "memory count" than anything. The machine says it's "testing" the memory, but it's basically paging through it to make sure it's all there. It will miss all but the most severe memory problems.

    I speak from experience, as the owner of several past flaky PCs that had bad RAM, and the owner of an SGI Indigo2, which had a SIMM that would get parity errors every now and then that the POST (or whatever it's called on SGIs) would fail to detect. If you really want to test the memory, you're going to have to run some real memory-test software, which typically takes a loooong time to run (hours or days). That's because a great number of memory errors happen only slightly too frequently to be called flukes.