Slashdot Mirror


Google Prefers DRAM to Hard Disks

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

24 of 354 comments (clear)

  1. Additionally by Phosphor3k · · Score: 4, Insightful

    How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.

    1. Re:Additionally by darkwhite · · Score: 3, Insightful

      Very often. And the problem is, unlike hard drives, which will try their best not to return the data if they have a hint that it's corrupted (meta-data, checksums, etc.), DRAM will be more than happy to return the incorrect data, which then might get written to disk. Some of the errors I've seen due to corrupt DRAM are pretty amusing.

      --

      [an error occurred while processing this directive]
    2. Re:Additionally by SilentChris · · Score: 3, Insightful

      I've seen a lot of "logic" arguments to this post, but I think people are missing a sort of obvious one: size. If you had enough RAM as an average hard drive (say, 20 gigs) I'm sure that at least *one* piece would be faulty. You're comparing, in a best-case server scenario, a gig of RAM vs. a 80-gig hard drive. I think if the numbers were even it'd be a "fairer" fight.

  2. Speed saves by coreman · · Score: 3, Insightful

    They make their money on hits served so speed is far more cost effective than cost of storage medium. If they can speed up serviing hits, they're ahead of the game.

  3. Scary! by Anonymous Coward · · Score: 4, Insightful

    Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
    Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.

    By the way, this guy can't speak for beans.
    The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"

  4. The key to it being cheaper is.... by rayd75 · · Score: 3, Insightful

    That it can handle many clients with little latency... You'd have to duplicate the data across a huge number of disks to provide similar response time to clients. Sure, if you were the only client, you couldn't tell the difference but with thousands upon thousands of clients all seeking data that would be stored in different locations on a disk things would quickly grind to a halt. Because so much unrelated data is being requested, seek time is the key. Sure, memory is more expensive per meg but its ability to serve so many more clients makes it less expensive overall.

  5. Re:Once again a simplistic view by NNKK · · Score: 2, Insightful

    Stack reliability, as someone else mentioned, on top of power and speed savings.

    Personaly I seriously doubt that all or even close to all of the stuff google stores is stored in DRAM, it's more likely they'd keep newer data and high-access data in DRAM, and older stuff gets archived to disk, avalible for recall later, but slower.

  6. Re:Cost v Speed by Alomex · · Score: 3, Insightful

    AFAIK, Google does not cache images, only HTML text. The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.

    Hence the size of the cache is somewhere between 500GB and 3TB, plus the index would be another 40% of that.

    My best guess is that the google archive is somewhere around a 2-3 terabytes, and that the total amount of DRAM available at google at the present time is somewhere between 5-10 terabytes.

  7. Re:Cost v Speed by andykuan · · Score: 4, Insightful

    It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.

  8. Hard disk is an obsolete technology by DrD8m · · Score: 1, Insightful

    Today new computers have 256 or 512 Mb RAM, that's what we've got 10 years ago (386-486 era), every day RAM gets cheaper and IMHO a spinning disk fails too much and it's too much slow to work with on a overloaded servers. RAM provides us almost instant access to data and doen't fails as a hard disk.
    I hope soon we'll only use some kind of RAM for everything and not a disk.

  9. Re:price comparison by bdolan · · Score: 2, Insightful

    If you have heavily hit database indexes, i.e. google, then you may need 100-1000x fewer machines. The cost of the disks is not the important cost, it is the far fewer number of machines for an equivalent query rate. However, you want to have far more than 2gb of directly addressed ram per machine--in fact at current prices it is probably cost effective to put 100's of gb per machine if you need to keep the query ram based--even if the CPUs are dwarfed in cost by the ram.

    This is one of the reasons that we need 64 bit addressability on commodity IA architecture ASAP -- Ram drives using an IO subsystem adds a huge overhead compared to indexing in arrays and natural data organization as opposed to fixed blocks of byte that have to be retrieved as a unit with 100s++ of instructions and security models in the way of access!

  10. Bottlenecks... by percey · · Score: 3, Insightful

    More often than not with a database your bottleneck is I/O. When you run a database you cannot have enough disks, and you cannot have enough FAST disks. In order to accomplish the kind of I/O bandwidth that a place like google is going to need you're going to need the best EMC arrays (or perhaps an IBM Shark) money can buy. And guess what? They run you megabucks. You can't just take a bunch of SCSI disks and expect them to perform as well as Fibre channel arrays. You gotta have controllers with multiple caches. Everyone who's never dealt with databases think that SCSI is the beginning and the end of hard drives, and its so far from being the truth its not funny.

    I've really no idea how complex the queries are or whether or not they use a relational database but that being said its still has to hit the disk to retrieve the data and that's where every decently designed database's bottleneck is. Besides google caches all its pages. Egads! Do you have any idea how much RAM they must need for just that alone? Yes RAM is faster. Oracle even teaches you to try to keep your frequently used tables in cache anyhow, because its fastest, of course they qualify that with the word small realizing that most people don't have the gobs of memory needed to cache large tables.

  11. More importantly than the DRAM... by LatJoor · · Score: 2, Insightful

    Although it's not mentioned in the Slashdot writeup, I think that probably the most important part of this interview was the discussion of Google's business model and future. It's good to see that they're committed to not getting in over their heads with extraneous services. They've found a business model that works and they're sticking to it, rather than getting greedy and adding dumb new services that have nothing to do with searching, or "search," as he put it.

    A lot of technology companies would do very well to follow Google's example, it seems to me. They're proving that Internet services are a perfectly sound venture if the company has a sensible business model and always keeps focused on providing quality technology and services in the area that they know best.

  12. Pretty amazing, but I can see it. by dinotrac · · Score: 5, Insightful

    Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

    1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
    2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
    3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
    4. Which leads, of course, to lower A/C bills during the warm weather.
    5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
    6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

    OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?

  13. Overview of Today's Headlines by Corrado · · Score: 4, Insightful


    Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.


    This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!

    Check it out here.

    --
    KangarooBox - We make IT simple!
  14. Re:RAM Disks by graxrmelg · · Score: 3, Insightful

    Google doesn't need petabytes of storage. Right now they claim 2 billion Web pages, 700 million Usenet messages, and 330 million images. That's a total of 3 billion things. Let's wildly overestimate their average size as 100K (remember that the Usenet archive doesn't include binaries). The storage space required would be 3e9 * 1e5 = 3e14, or 300 TB.

    It's probably true that 20 TB isn't enough for Google, but it's not true (and won't be for quite a while) that the cached pages and Usenet archive require "a few PB".

  15. You guys are missing the point... by duffbeer703 · · Score: 4, Insightful

    DRAM requires little electricity and produces almost no heat.

    Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.

    Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  16. Re:Cost v Speed by Anonymous Coward · · Score: 1, Insightful

    and that's just the text (in fact I only search for .htm, .asp, .php* and .html files).

    If you're php and asp files you very well could be pulling their database not just "Web" pages. I run a web server for an online store which consists of a few (15-20) meg of phtml/html/gif/jpg, but if you try to mirror the site you will cycle through our entire mysql database of products and end up with a couple gig of dynamically generated pages.

  17. Re:Google is great... by russianspy · · Score: 2, Insightful

    They do. Read the guide. You can include parethesis, AND, and OR. I don't remember if they allow XOR and others. Oh... They allow negation as well.

  18. quick math by Anonymous Coward · · Score: 1, Insightful


    Lets assume that Google needs 100 TB of data. Possibly not correct, but probably not off by more than an order of magnitude either high or low.

    Lets just take a look at sharky's ram price guide, and we see that a 512 meg module costs about $75, or $125 if it's ECC. So one gig of ram costs between $150 and $250.

    Assuming they used some sort of non-standard computer system that supports vast quantities of ram (so the system price is almost entirely dependent on RAM prices) then we find that one TB costs about $200,000 or $300,000. This assumes that a box which can hold 1 TB of ram (2,000 of the 512 mb modules) costs about $50,000. Perhaps not beyond reason. Maybe it costs more, but once again it should be within an order of magnitude (no more than a million $ or so).

    If they have 100 TB of stuff they need to store then that comes to a grand total of $30,000,000 to store it in ECC dram. Not unreasonable.

    Of course, if the database size is only about 10 TB, then the total cost is more like $3,000,000 which is pennies for Google (probably). Basically, RAM is not so expensive that huge quantites of data cannot be stored in it, if one is determined.
    In addition, the power dissipation would be very low, fewer power supplies, fewer servers of every sort, etc.... Do you think you could build a massive fiber channel RAID array that would serve Google's needs for $3-30 million?

    My $.02

    Tyler Ward
    tjw19@columbia.edu

  19. Re:Cost v Speed by Score+Whore · · Score: 3, Insightful
    The idea that all this is on DRAM is staggering. If the refresh stops (board failure, power problem) the data is just GONE?!


    Google doesn't create content. They are a search engine. Nor are they in the business of archiving the net for posterity. If they lose data, it's out there to be recollected or if not, then there's no point in them saving it anyway.
  20. Re:Take a BUSINESS perspective (yes, it's painful. by Colz+Grigor · · Score: 3, Insightful
    One other follow-up:

    Google will also likely break their technology into three components:

    spidering and indexing

    searching

    caching

    Each of the financial analysts for the business groups responsible for each asepct of Google's technology may calculate the value of DRAM vs. HD differently. For searching, latency is extremely critical, but it's not so critical for caching, and there may be some physical problems with solely using DRAM for indexing.

    That being said, I would expect Google to use HDs for spidering and indexing, DRAM for searching, and HDs for caching. Mr. Schmidt was probably only discussing technology on the most visable component of Google's technologies: searching.

    ::Colz Grigor

  21. Re:Index space? by spiro_killglance · · Score: 3, Insightful


    I don't know how google to it. But typical the
    main over head is the inverse file, for every word on every page, you just need the number of the page it was in and the word position on that byte. So the Google needs around 8-12 bytes per (non stoplisted) word.

  22. Silly people! by m.dillon · · Score: 3, Insightful

    You guys crack me up some times.

    I'll lay it out. Obviously Google is not storing the master copy of the full multi-terrabyte database in ram, but they are certainly storing as big a chunk in ram as they can, and the cost model ought to be easy for anyone to understand if you sit down and think about it.

    Consider the cost difference between the following EQUAL amounts of hard disk storage:

    * A 160GB IDE drive

    * A 160GB SCSI drive

    * Four 40GB drives in an external RAID system

    * The cost of a small medium-performance RAID
    system.

    * The cost of a larger high-performance RAID
    system scaleability to a terrabyte.

    * The cost of an *EXTREMELY* high performance RAID
    system scaleability to multiple terrabytes.

    Now consider the cost of building, say, a 40 terrabyte data store (lets not worry about backups for this experiment). If you build it out of a bunch of huge SCSI drives connected to a bunch of PC's it can be fairly cheap. But if you build out of, say, high performance EMC arrays it could cost millions of dollars more to get the same theoretical performance.

    So when you consider the cost of storage, you always have to consider the cost of the PERFORMANCE you want to get out of that storage. All the Google CEO is saying is that, Doh! It's a hellofalot cheaper to improve the performance aspects of the system by buying DRAM in a distributed-PC environment in order to be able to avoid having to purchase extremely-high performance (and extremely expensive) disk subsystems. The cost of purchasing the DRAM to make up for the lower-performing disk subsystem is actually LOWER then the cost of purchasing an equivalent higher-performance disk subsystem.

    The same is true in the ISP world. When RAM was expensive we had to rely on big whopping HD systems to scale machines up. But when RAM became cheap it turned out that you could simply throw in a very high density drive with 1/4 the performance that four smaller drives would give you, and the operating system's RAM cache would take care of the problem. Suddenly we no longer needed to purchase big whopping disk arrays.

    Think about it.

    -Matt