Slashdot Mirror


Google Prefers DRAM to Hard Disks

KP writes: "I came across this interview with Google's CEO. A very interesting read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

354 comments

  1. I can see it now... by AcidDan · · Score: 2, Offtopic

    In the hallowed halls of Google... Row upon row of uber-boxen with a Bagillion megabytes of ram...

    Then someone trips over the power chord...

    -- Dan =)

    1. Re:I can see it now... by yobbo · · Score: 1

      Haven't u heard of a UPS?

      A....really... really......... bloody huge UPS

    2. Re:I can see it now... by Egonis · · Score: 2, Funny

      Actually... when I worked at Internet Direct (in Toronto, Canada) one of the NetAdmins shut down a DNS Server with his ass when he backed into a Netfinity box.

      So where is your UPS NOW?

    3. Re:I can see it now... by Dun+Malg · · Score: 1

      I once shut down a huge travel agency's datacenter (multiple servers PLUS their entire phone system) by bumping the switch on the UPS cabinet with my ass. An UPS is useless when some idiot removes the safety cover from the main power switch so someother idiot like me can run into it. I hit that switch and shut down every machine in the room. That room got quiet, and all the rooms outside got very loud...

      --
      If a job's not worth doing, it's not worth doing right.
    4. Re:I can see it now... by Cramer · · Score: 1

      Funny! I've seen crazy stuff like this all over the place. The UPS has to be inside the computer to be safe. No ammount of redundency can protect against the power switch...

      Micah powered down the NetApp feeding the web servers at Interpath years ago... He turned it back on immidiately *grin* Just days earlier I had been talking to Network Appliance about the hair trigger nature of that switch -- a big, black switch in the center of the front of the box (an F210, btw.) I taped an angle bracket over it the next morning.

      Even logner ago at Interpath, we also had a modem bank (Microcom HDMS/fast -- huge power hungry PoS) powered down by a cleaning lady's mop once.

      After Interpath moved the office, we had half the server room powered down by a short in one of the cubes. It was on a completely different circuit, however, the breakers are magneticly thrown. When that one went, it popped the breakers next to it as well. All totaled, I think there were six circuits cut off. Even the E450 powered on two different (UPS backed) circuits got cut off.

      I had a network switch powered down by a vacuum cleaner when I was at Make Systems. My workstation housed the clearcase view for a patch or optimized build (I forget which.) I replaced it with a different swtich that didn't have an on/off switch.

    5. Re:I can see it now... by MadFarmAnimalz · · Score: 0, Offtopic

      http://www.google.com/search?hl=en&q=%22trip+over+ power+cord%22+google Gee, zero hits returned... Doesn't look like they have any contingency plans for this one. Well spotted, mate!

      --
      Blearf. Blearf, I say.
    6. Re:I can see it now... by Anonymous Coward · · Score: 0

      Hmmm, as an alternative to this scenario of 'uber-boxen' consider this.

      A wall of racks containing fast cpus, quick-boxen, connected to something like Imperial Technology's Megaram 5000 (ie. RAM based) file cache accelerators storing the indecies and even possibly the cached content. Each Megaram 5000 takes up 5 rack units to provide up to 51GB of .035msec access time storage to 8 fiber channel and 16 ultra-2 SCSI connected switches while consuming/disipating a maximum of 385 Watts of internally UPS protected (ie. no power cord to trip over and loose data over) power.

      So "power tripping" isn't a problem for the large index -such as Google may have- requiring several 51GB file caches. So this potential disonant chord would have to go to the switches or quick-boxen connected to the switches. So we have a reasonable UPS dedicated to each switch/multiple quick-boxen rack with over head wiring or sub-floor wiring.

      Now if somebody is waking on the ceiling :) they might take out one rack which could potentially cause some searches to fail....

      I can hear it now...
      "Honey, I'm searching for the NeedlessMarkup Chocolate Cookie recipe and nothing's happening." "Oh, just try again." [Mean while back at the farm.] "Bob, I think Jim is the only person alive physically brilliant enough to walk on his hands on the ceiling and yet clumsy enough to trip over a power cord doing it."
    7. Re:I can see it now... by (outer-limits) · · Score: 1

      My workmates once powered off a mainframe when they were playing cricket. The ball was right onto the power switch.

      --

      Microsoft - Where would you like to go today, Maybe Jail?

    8. Re:I can see it now... by djlosch · · Score: 1

      actually what i think the ceo has concluded is that with the tiny seek times (of course dram has miniscule seektimes compared to hd) the site is bogged down by less users at any given time keeping the site faster. faster site = better advertising medium = more money from advertisers. this may be more expensive initially but over a few months definitely makes up for it when the topic is overhead cost on comparison of hd to dram. i dont know of any other viable explanation. even if i hit up pricewatch, i still cant get anything nearly comparable (in dram) to the 130 USD for 60 gigs.

    9. Re:I can see it now... by Sarcasm_Orgasm · · Score: 0

      Serveral? Try Hundreds, ever seen http://groups.google.com/ that alone has to be terabytes+

      --
      Special people have long socks, ride short buses, & invent witty sigs.
    10. Re:I can see it now... by cloudmaster · · Score: 2

      I wonder if they'd use SCSI or, snicker, IDE drives in those machines? Compare those costs, and the difference gets much less.

      Not that I wanna get into an argument over which is better, but I'd bet that they'd rather have their CPUs crunching number than wasting their time waiting for an unintelligent BUS to catch up... :)

    11. Re:I can see it now... by klez23 · · Score: 1

      of course, if each machine is faster at serving requests, you need fewer machines to handle the total number of requests, right?

    12. Re:I can see it now... by Anonymous Coward · · Score: 0

      Mr. Cloud: OK, first off - I used to be a SCSI fanatic, my first hard drive was a 80 Meg ST1096N I bought with 2 years of savings for an Amiga 500. SCSI kicks ass in the server room, but it absolutely sucks for home use. Why, you might ask?

      1. You don't waste all of that theoretical bandwidth like you do with IDE. A more streamlined protocol.

      No, but there aren't any devices out there that are going to suck up all that bandwidth on the consumer side of the market, either.

      3. Did I mention that a UW SCSI bus can have 15 targets (i.e. drives) with one initiator (i.e. controller), while IDE is still stuck at 2 per bus...

      I dare you to try and get 15 SCSI devices from different manufacturers and different scsi revisisions running reliably on the same bus. I had a couple different hard drives and CDROMs on a wonder scsi bus - it was a wonder why and when it would work.

      4. Disconnect.

      It'd be great if devices would disconnect..

      5. Smarter devices - they do more for you.

      Great idea - works well with scanners - doesn't work so well when the devices start to compete and see who's smartest, which is usually done instead of them doing what they're supposed to. Doesn't happen with IDE.

      8. Cable length.

      Do you know how hard it is to find scsi cables and adapters in most cities - and how much you'll pay for them? UGH.

      12. You can run it as a lan...

      Now this is cool; We did this with some amigas back in the day. Of course, ethernet cards work better now :).

      Overall, my experience with SCSI was hell on earth. Trying to make a scsi cdrom, burner, hard drive(s), and other goodies from different manufactuters work and boot Windows was hell on earth; I haven't had any such problems with IDE, the controllers are there, and I don't need more than 2 hard drives and 2 removable media devices in any box (I'll just buy another box and set up a server).

    13. Re:I can see it now... by Shanep · · Score: 2

      Guys, the UPS setup you use probably seems insignificant if the land line providers you use balls stuff up every now and then. ; )

      In 1991, I was working for .au Telecom's Digital Data Network team. I was a trainee at the Haymarket exchange and we were doing some old cable removal. Thick (ish, thick if compared with UTP, thin if you're thinking inter-continental submarine data cables) cables servicing many big companies. We tended to sell leased lines and packet switched lines to large, important companies. Small companies could not afford the prices we charged for so much as a 300bps connection and ISDN we were charging crazy money for back then. So, when my boss told me to cut that fat cable, I double checked his request and then cut it with a big cable cutter (kinda like a bolt cutter, except for cables)....

      A couple of banks local to Haymarket Sydney... DOWN. The Haymarket TAB (sports betting)... DOWN. Various other angry customers down too.

      Man I wish I could have known what I was cutting before I did, so I could enjoy it a bit more. ; )

      Plenty of guys, including myself, that night were doing unpaid overtime. The union would have loved to hear about that. DDN was always touting these incredible uptimes for their services, yet they were not really that great. That boss of mine was a real fuckwit (ex army arsehole) anyway.

      --
      War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
    14. Re:I can see it now... by Shanep · · Score: 2

      PS, when I was working for the stock exchange, I was glad to see that the main and backup sites had computer and phone data all redundant through landline and microwave links.

      A lot of Co's use microwave in au and I guess it's not just because it's cheaper in the long run!

      (BTW, the ASX microwave link did go down once that I know of, when construction work between the sites had a large crane sometimes blocking the line of sight.)

      --
      War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
    15. Re:I can see it now... by Anonymous Coward · · Score: 0

      Bah. All you need is a 486 DX4/100 with 32MB RAM so you don't waste processor speed. Then you can set up some "servers" to hold your warez and pirated mp3z because your motherboard only handles 3 2048MB drives and a cd-r. Meanwhile, my CPU is faster than I need most of the time (distributed.net takes care of that) and my hard drives are faster and more reliable than IDE crap will ever be - and I've got full bus speed access to all of them, not 10Mb/s cat3 that I stole from the dumpster behind the local office building. While your hard drive access is choking your CPU because of the stupid drives generating interrupts left 'n right, my SCSI card is letting my CPU keep real-time a/v real-time.

      Go back to your slow car, slow computer, minimum-wage job, and 1-room apartment. They're "good enough" for you because they "get the job done adequately". I'll stick with the best I can get. :)

  2. Cost v Speed by JohnHegarty · · Score: 1

    I think its only cheaper on a Cost verus Speed basic. I am sure the google archive is only a few 100gb , and thats not too much to buy in ram. A hard disk would be cheaper but alot slower. Costing the company extra money in the long run.

    1. Re:Cost v Speed by Space+cowboy · · Score: 5, Interesting
      JohnHegarty scribbled

      I am sure the google archive is only a few 100gb


      Err. No.

      I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.

      On that basis, I echo the original query about how it can be economical to use RAM...

      Simon
      --
      Physicists get Hadrons!
    2. Re:Cost v Speed by PhotoGuy · · Score: 2
      I am sure the google archive is only a few 100gb

      Huh? I would have thought it would have been between 10x to 100x that much. Especially if they cache most pages. (Maybe they just use dram for the indexes, and hd's for the cache?)

      I still don't understand that claim. $300 will get me a 160G drive, and I can load four of them in a cheap PC case or 1U rackmount case, 640G per unit. That's under $2K for .64 Terabyte.

      RAM prices vary wide, but say on the low side I can get 256M for $20. I'd need 2560 sticks of 256M to equal 640G, or $51,200 for the equivalent storage. And that doesn't take into account that most reasonably priced PC motherboards only handle 2G or 4G of memory these days. You'd need 160 motherboards in the best case, adding $80,000 to the cost, assuming you could get 4G per unit, and $500 per motherboard/chassis. Let's, see $51K+80K = $131K, versus $2K.

      RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).

      Either their archive is a lot smaller than I assumed, or they're talking performance/price tradeoffs, where speed has a high premium.

      -me
      --
      Love many, trust a few, do harm to none.
    3. Re:Cost v Speed by DrXym · · Score: 2

      A few 100gb to cache the entire internet?

    4. Re:Cost v Speed by ekrout · · Score: 1

      On that basis, I echo the original query about how it can be economical to use RAM...

      Google's like the "Neuman" (Seinfeld) of cyberspace. "When you control the web, you control the world!"

      They probably just trade higher search status for sticks of RAM.

      --

      If you celebrate Xmas, befriend me (538
    5. Re:Cost v Speed by Anonymous Coward · · Score: 1, Interesting

      Imagine to keep search queries at an acceptable level, you may need 4 boxen with hard disks to perform as fast as 1 box with a wedge of RAM. So the single cost of RAM makes 3 boxen no longer needed.

    6. Re:Cost v Speed by Alomex · · Score: 3, Insightful

      AFAIK, Google does not cache images, only HTML text. The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.

      Hence the size of the cache is somewhere between 500GB and 3TB, plus the index would be another 40% of that.

      My best guess is that the google archive is somewhere around a 2-3 terabytes, and that the total amount of DRAM available at google at the present time is somewhere between 5-10 terabytes.

    7. Re:Cost v Speed by andykuan · · Score: 4, Insightful

      It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.

    8. Re:Cost v Speed by Yokaze · · Score: 2

      I think he (Eric Schmidt) spoke of storing the indices.
      Traditionally, they are only stored partially in RAM due to their size.

      Certainly, the unprocessed pages are still stored on HDs as one doesn't gain
      anything from storing them in RAM.

      --
      "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
    9. Re:Cost v Speed by leuk_he · · Score: 5, Interesting

      this makes more sence then:
      PC World: What are Google's biggest challenges?
      Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.


      If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.

      And i bet they store only the most frequest used part of the index in memory.

      Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a /.ed site)

    10. Re:Cost v Speed by dizzydogg · · Score: 1

      I seriously doubt that they keep the cached pages in ram. They probably keep them on hard drives, and only keep the keyword index in ram. I mean, how often is the cached pages feature used in comparison with the search feature.

    11. Re:Cost v Speed by jjeffers · · Score: 1

      I'm guessing that Google stores the indexes in memory, but not the cached versions of the sites. I remember reading that they have thousands of machines, so it is quite possible that they have 1000's X 4GB of memory in each, and then big fibre channel fabrics to load the memory and serve the images from.

      It would be really neat if Google would publish an updated technical overview.

      -Jim

    12. Re:Cost v Speed by tmalsburg · · Score: 1
      • Maybe hard disks have more power consumption.
      • Maybe hard disks fail more often.
      • Maybe the failure of a hard disk is much more expensive.
      • Do you have to replace a dram chip if it fails? Not necessarily. You just don't use it anymore.
      • If you use RAM you don't need hard disk controllers.
      • ...

      Titus
    13. Re:Cost v Speed by justinstreufert · · Score: 1

      Google does so cache images. :)

      The idea that all this is on DRAM is staggering. If the refresh stops (board failure, power problem) the data is just GONE?!

      Justin

      --
      "Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
    14. Re:Cost v Speed by Alomex · · Score: 2

      Google does so cache images [google.com]. :)

      Cute, but not quite correct. They cache post-stamp sized copies. If you want the full image you have to go to the original web site.

      Granted, this does increase somwhat my original estimate of the amount of DRAM required.

    15. Re:Cost v Speed by Space+cowboy · · Score: 5, Informative
      Alomex wrote:

      The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.


      I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

      As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:



      cd /opt/search/var/sites/26_europa.eu.int
      du -sk .
      7731586 .


      That's a 7.7Gb website, and that's just the text (in fact I only search for .htm, .asp, .php* and .html files). This particular website is growing at the rate of a couple of hundred Mb each month.

      I just think that your estimate for the cache size is a long way short of the real figure...

      Simon
      --
      Physicists get Hadrons!
    16. Re:Cost v Speed by Anonymous Coward · · Score: 0

      Makes sense. They're "Searching 2,073,418,204 web pages", each of which occupies how many bytes in index files? 50-100 networked machines worth? No big deal, just give me a few FreeBSD boxen and consider it done. (I dont think Linux can do it because it'll require LAN IP tokens and 36bit addressing. So much for Open Source.)

    17. Re:Cost v Speed by Graymalkin · · Score: 2

      Your single box for 2000$ doesn't take into consideration the fact Google needs to make their tons of information available to everyone at once. With a search engine like Google it is going to be rare information is just going to sit around and never be used. This means that by conventional database architecture logic you keep it cached in RAM. Hard drives are useful when you're cutting power to a computer, how often does Google reboot?

      --
      I'm a loner Dottie, a Rebel.
    18. Re:Cost v Speed by Alomex · · Score: 2

      I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

      Indeed, this has been a hot area of debate for the last 7 years or so, when the first paper with a substantially larger web than that indexed by search engines came out.

      Usually search engines estimate the web size to be about 15-30% of that claimed by statistical measurements.

    19. Re:Cost v Speed by kyrre · · Score: 1

      1. Google runs on linux
      2. FreeBSD is Open Source 3. I should not respond to trolls

    20. Re:Cost v Speed by Anonymous Coward · · Score: 0

      Google came to my campus recently. They now have over 10,000 Linux nodes and have a petabyte of data.

    21. Re:Cost v Speed by wwwillem · · Score: 1


      RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).

      Well, let's take for a second that your calculation is correct. Than, if the DRAM solution is 100 times faster (sorry, no data to support that claim, but let's assume) than the HD solution, you see why the DRAM way is more cost-effective than the HD way.

      --
      Browsers shouldn't have a back button!! It's all about going forward...
    22. Re:Cost v Speed by Anonymous Coward · · Score: 1, Insightful

      and that's just the text (in fact I only search for .htm, .asp, .php* and .html files).

      If you're php and asp files you very well could be pulling their database not just "Web" pages. I run a web server for an online store which consists of a few (15-20) meg of phtml/html/gif/jpg, but if you try to mirror the site you will cycle through our entire mysql database of products and end up with a couple gig of dynamically generated pages.

    23. Re:Cost v Speed by zerocool^ · · Score: 2

      Hrm...

      So this is why SDRAM prices have been going up and not down lately...

      Bastards...

      ~z

      --
      sig?
    24. Re:Cost v Speed by jovlinger · · Score: 3, Interesting

      Just a thought:

      when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).

      It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.

      The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.

    25. Re:Cost v Speed by BlueOtto · · Score: 1

      For some real figures on how much space it is to cache the web, check out The Wayback Machine. All of their caches come out to 100TB. They have more info on their site about different times and how much storage it takes them.

    26. Re:Cost v Speed by mstrjon32 · · Score: 1

      remember also the cost of replacing failed drives, as opposed to a quality DRAM module that (under proper conditions) would never fail.

    27. Re:Cost v Speed by Yokaze · · Score: 3, Informative
      > each of which occupies how many bytes in index files?

      According to "The Anatomy of Large-Scale Hypertextual Web Search Engine" by Segey Brind and Lawrence Page, the inverted index ("inverted barrels") was about 47.2Gb large (Total data without repository 55.2Gb, Repository 53.5Gb). It had about 24 Million web pages indexed. Assuming a linear increase this amounts to about 5Tb.
      But, to quote from the paper:

      With better encoding and compression of the document index, a high quality web search engine may fit onto a 7Gb drive of a new PC.

      Which is surely slightly exaggerated, but shows that they considered that there is room for improvement. (E.g using varying length index instead of fixed width)

      >I dont think Linux can do it
      At least they think it can do it, since they are using Linux boxes, at least accoring to

      The Technology Behind Google, by Jim Reese CEO.
      More than 10,000 Linux boxes, that is.
      --
      "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
    28. Re:Cost v Speed by Anonymous Coward · · Score: 0

      I wondered why when I search for "busty redheads" I get page after page of Micron links

    29. Re:Cost v Speed by ahde · · Score: 1

      actually, its 6400% more.

      100% more is 2 times. ;)

    30. Re:Cost v Speed by kesuki · · Score: 2, Interesting

      Google doesn't cache images google doesn't index or cache dynamic (scripted) content google caches PDFs as Plaintext.
      However they are definitely on the scale of terrabytes. "Searched the web for a.
      Results 1 - 10 of about 1,470,000,000. Search took 0.31 seconds." Assuming an average of ~25k cached per link 1.4 billion links would leave a cache of about 37,632,000,000,000 bytes, However The Cache doesn't necisarily need to be stored on RAMDISKs. He clearly states that it's 200,000 times more efficient for _seekable_ data. This means not the 'cached' data but rather the stuff that the search alagorythm looks at to show you appropriate hits. So the heart of the 'search' engine is using RAM exclusively, but 'cached' data would almost certainly still be stored on HDs, unless of course someone has built google a bunch of 120GB DRAM disks that use conventional HD interfaces (sorta like the Flash memory Drives, only on steroids when it comes to speed).
      It could even be misleading Google could have meant flash memory HDs were cheaper but mistakenly refered to them as DRAM.

    31. Re:Cost v Speed by Anonymous Coward · · Score: 0
      Granted, this does increase somwhat my original estimate of the amount of DRAM required.

      Not necessarily -- it would make a lot of sense for them to store only the searchable data in DRAM -- i.e. the text. That's where the fast access is most needed.

    32. Re:Cost v Speed by Anonymous Coward · · Score: 0

      Disks fail more often than RAM. Maintenance costs.

    33. Re:Cost v Speed by Anonymous Coward · · Score: 0

      That pretty much nails it. Basically, to get comparable performance from a disk-based solution for the indices, you'd probably need a mega-RAID from hell, with mondo replication to handle many requests in parallel.

    34. Re:Cost v Speed by jim.robinson · · Score: 1

      I don't know who came up with that size, but it cannot be right -- or else it just isn't current. I looked at the disk usage for our web sites (my group runs a not-for-profit web publishing service), and we use about 1Tb of space for our content.

    35. Re:Cost v Speed by BinxBolling · · Score: 2
      RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).

      The data isn't just sitting there static, though: It's being searched. To switch to hard drives and maintain their current performance level, they would have to increase the parallelism of the search, by having many more copies of the index. One copy of the index on disk is not really equivalent to one copy of the index in DRAM, because the DRAM index can be searched many times in the period it takes to search the HD index once.

      The quantity they're trying to minimize is not dollars per megabyte, but rather dollars per (megabytes searchable per second).

    36. Re:Cost v Speed by Space+cowboy · · Score: 2

      Sorry, I wasn't being clear - I forgot to point out that these files are already compressed (using gzip), but only on an individual file basis. The real site is significantly larger than this 7.7Gb, and I should have mentioned that.

      Whereas I agree that we're getting close (or maybe have passed) the point where it would make sense to do something better, since I don't have much of a budget, and disk is cheap ....

      ATB,
      Simon.

      --
      Physicists get Hadrons!
    37. Re:Cost v Speed by Sj0 · · Score: 2

      Consider other things though. While the initial cost is high, electrical power for 1GB of RAM is lower than that of 1GB of hard drive, and since RAM is solid state, maintenence costs would be tiny. Imagine the costs of keeping a few hundred hard drives, each rattling away 24/7 from dying out!

      --
      It's been a long time.
    38. Re:Cost v Speed by Score+Whore · · Score: 3, Insightful
      The idea that all this is on DRAM is staggering. If the refresh stops (board failure, power problem) the data is just GONE?!


      Google doesn't create content. They are a search engine. Nor are they in the business of archiving the net for posterity. If they lose data, it's out there to be recollected or if not, then there's no point in them saving it anyway.
    39. Re:Cost v Speed by Cramer · · Score: 1

      Not to start a SCSI vs. IDE war... I don't think Google is crazy (read: stupid) enough to bank their business on the cheap-ass commodity IDE drives that last a year or so. It doesn't matter that Maxtor will replace the PoS for three years; you still have to replace it and rebuild or recollect the data.

      As for speed and cost, SDRAM is about 6 orders of magnitude faster than even that fastest hard drives -- nanosecond vs. millisecond. Costwise, he's smokin' some mighty weed; SDRAM is 2 orders of magnitude more expensive than the most expensive SCSI drive in the world (the 180G Seagate Chetah.) RAM is about 30 cents per meg and hard drives range from .2 (IDE) cents to .8 (SCSI) cents per meg. In the end, one needs boatloads of both.

    40. Re:Cost v Speed by Cramer · · Score: 1

      HD seek time: 4.9 MILLIseconds
      SDRAM "seek" time: 5 NANOseconds

      Therefore RAM is approx. 1 MILLION times faster. (DDR SDRAM is even faster.)

    41. Re:Cost v Speed by SplatFileGoo · · Score: 1

      >I am sure the google archive is only a few 100gb

      The index was reported to live on one 80gig drive per machine is what has been reported last year. (that was before the last jump from 1.3b pages to 2.0b pages. It works out to 5000-7000 pages per MEG of indexed (html stripped) compressed data. The "cached" pages are stored separatly on Googles proprietary "big file" formated disks (a random access file system).

    42. Re:Cost v Speed by rogergregory · · Score: 1

      This one has two levels deep of nonintuitveness.

      Most search engine sites have a lot more load searching the net than answering queries. The net is still the same size but there are only a few queries a day. Google is a different kind of beast.

      Google has datacenters full of machines spread over the world answering queries, but still needs to spider and archive the web once. Their query servers have a complete index. They are packaged about 40 to a rack, and the racks replicated as needed.

      A search engine spends almost all of it's time pulling indexes off of the disk. That's how the factor of 200,000 comes in. I forget the size of the index I was quoted, but it was in the range of 40Gb within a factor of two. I expect that puting the index in RAM will shift the bottlenecks elsewhere from the index but the breakeven performance gain is probably ($40,000/server with 40Gb DRAM)/($2000/server) = 10 and a lot of other factors weighing on the side of fewer servers.

      Note that transitory data failures can be refresed out of disk. And that with over 8000 servers (an old figure) they would still have lots of servers to distribute.

    43. Re:Cost v Speed by Space+cowboy · · Score: 2

      It's a risk, but the problem is that other sites will intermingle .html and .php/.asp depending on whether there is any customisation or even just for headers and footers.

      In this case, almost all the documents are in fact dumps of pdf files also on the original site. I chose it because I knew it was big :-)

      Besides, for a search engine, getting the catalogue can be a useful thing - in the sort of targetted search engine that I'm maintaining, anyway. A lot of the searches are for particular mathematical models (mainly excel spreadsheets at exorbitant cost). These tend to be catalogued just like any other online shop...

      ATB,
      Simon

      --
      Physicists get Hadrons!
    44. Re:Cost v Speed by justinstreufert · · Score: 1

      OK, but would it not be a horribly lossy situation if they had to go down for a week while they re-spidered the entire Internet? ;)

      They may not be "creating" content, but this is their product! Think about it.

      Justin

      --
      "Why would God give us a waist if we wasn't supposed to rest our pants on it?" - Rev. Roy McDaniels
    45. Re:Cost v Speed by CrabCakeJimmy2k · · Score: 0
      I really cannot agree with you on the reliabilty of SCSI HDs Vs. IDE HDs. Not too long ago I worked for a company in MN who used SCSI and IDE. The thing is that there were about 20 or so IDE drives to every 1 SCSI drive, but the dead drives box had three times as many dead SCSI drives in it than IDE.

      At home, I have never had a IDE drive die on me, but I have had all 3 SCSI drives that I have used in my personal machines die.

    46. Re:Cost v Speed by painkillr · · Score: 1

      You forgot to consider the possibility that they're not using IDE, it's more likely that they're running SCSI drives.

      So re-calculate your cost per MB w/ non-RAID SCSI and RAID SCSI. AFAIK, I haven't seen any 10k RPM SCSI drives 160MB in size.

    47. Re:Cost v Speed by painkillr · · Score: 1

      jeebus, that's 160GB, not 160MB.

    48. Re:Cost v Speed by LinuxInDallas · · Score: 1

      I've seen this before, why is it that everyone says "boxen"? Whos started this? Excuse me, I'm off to eat some wieners...

    49. Re:Cost v Speed by psamuels · · Score: 1
      I've seen this before, why is it that everyone says "boxen"? Whos started this?

      It's by analogy to vaxen, itself of course an analog of oxen, which is legit.

      I think boxen was a massively parallel coinage. I'm pretty sure I started using it before seeing anyone else's usage, but I hardly think I'm influential enough to have been the originator. (Same story with Pentia meaning more than one Intel 586.)

      --
      "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
    50. Re:Cost v Speed by Cramer · · Score: 1
      Just a guess, but the IDE drives were in desktops that were not running under any measurable load 24/7. And the SCSI drives were old in comparison.

      I've had both SCSI and IDE drives die in various inventive ways. IDE has proven to be far less reliable (tho' very much so cheaper) ...

      I've never had a SCSI drive be defective right out of the box. However, 4 out of 16 80G IDE drives for a 1TB array were defective right out of the retail box. Two of their factory replacements were defective.

      I've never had a new SCSI drive fail within hours or days of going into service. I've got new Maxtor IDE drive to RMA that failed *4 hours* after being put in service.

      Over the last decade, I've replaced seven SCSI drives -- 4 completely dead, 1 field repairable (low level format), and 2 perfectly functional tho' noisy as hell. Over the same decade, there have been dozens of defective IDE drives thrown away -- thrown heads, bad sectors in random locations, bad sectors in the partition area, drives that can no longer track, drives that no longer spin up, drives that no longer spin at a stable speed, and (my personal favorite) the drive that works for exactly three days.

      There's no secret, I like SCSI. It's a perfectly clean, extensible protocol and has been from day one. IDE is a hideous, horrible stack of kludges. Over the past few years, there have been more and more crusty band-aides strapped on to give it more and more of the features and performance SCSI has enjoyed for years. IDE is cheap in more ways than just price.

    51. Re:Cost v Speed by nukebuddy · · Score: 1

      Sj0 wrote:
      While the initial cost is high, electrical power for 1GB of RAM is lower than that of 1GB of hard drive,

      1GB of RAM consumes about 5-10 watts. 1GB of hardrive has negligable power consumption in comparison.

      -nb

    52. Re:Cost v Speed by Feyr · · Score: 0

      the last time i heard, google used just over one PETABYTE of storage (with all the redundency) and that was a few months ago. bet they've got some pretty neat discounts on that ram eh?

    53. Re:Cost v Speed by urth · · Score: 1

      I always thought it was because you add "en" to pluralise a word in german. So saying boxen is like saying "look at me, I am euro-cool, I say boxen"

    54. Re:Cost v Speed by Anonymous Coward · · Score: 0
      I've had both SCSI and IDE drives die in various inventive ways. IDE has proven to be far less reliable (tho' very much so cheaper) ...

      Right, IDE is cheaper and lower quality. No big surprise. Also there is so much competition, some vendors seem to simply ship untested drives (probably this gains money, even if it is inconvenient, and the drive is changed in a few percentage of the cases).

      Note that there is absolutly no reason why vendors couldn't just use the SCSI mechanics with IDE drives, is just that there is no big market. But still, for Google who WILL IN ANY CASE have a sizable number of failed hard drives every day whether they are using SCSI or not, it makes sense to buy the 5 times cheaper IDE hard drives. They need a strong failover mechanism with fast recovery in anycase, so they use it at full load.

      Over the past few years, there have been more and more crusty band-aides strapped on to give it more and more of the features and performance SCSI has enjoyed for years. IDE is cheap in more ways than just price

      The "more and more" part is partly wrong. DMA was a tremedous relief. Because electronics is now so much faster, now the bottleneck is really the mechanics. Mechanics of SCSI disks are better than IDE on average (and that's what you are paying, along with QA). That still means that a cheapo SCSI disk doesn't worth more than a good IDE disk ; it's not IDE vs SCSI, it's really low-quality disks vs high quality disks.

    55. Re:Cost v Speed by mstrjon32 · · Score: 1

      Actually Quantum's Fireball Plus LM (and earlier) ATA drives had their mechanics directly related to their SCSI counterparts. On that note, the Plus LM was the most reliable ATA HDD I've ever owned, not only reliable, but fast as well.

    56. Re:Cost v Speed by Score+Whore · · Score: 1

      One would hope that they won't lose their entire server farm in one fell swoop. Losing a single server out of something like 4000 or 8000 isn't a big deal even if you lose all the data it had, when you are in the business that google is in.

    57. Re:Cost v Speed by markmoss · · Score: 2

      They're not going to lose refresh because of power failure. No matter what the storage technology is, you don't leave a server farm like Google's at the mercy of the local power grid, you have some sort of generators for backup.

      They _will_ lose bits of data. DRAM chips fail. Motherboards fail (taking out perhaps 2G at a time). Cosmic rays flip individual bits. It's much less lost at a time than HD fails, but probably the flipped bits occur far more often. But Google never guaranteed 100% accuracy...

    58. Re:Cost v Speed by Anonymous Coward · · Score: 0

      Nor are they in the business of archiving the net for posterity.

      I didn't know this was a business.

      Sounds interesting though...Maybe CMGI would fund it.

    59. Re:Cost v Speed by PhotoGuy · · Score: 2

      Actually, I thought I heard that Google uses single IDE drives, in a whack of distributed generic PC's. No SCSI involved.

      And as several other posters commented, *YES*, I AGREE, if speed vs. cost is a factor, then the 65x caculation is less relevant. But it'd take a heck of a lot of requirement for speed to overcome a 65x cost savings (put 30x more machines in place at half the cost, and get the performance you need, with the right architecture).

      And one of the most popular (my favorite) search engines might just mandate speed to the point that a 65x cost penality is *well* worth it.

      Man, I wish people would *read* the posts in detail before posting. (Not that *I've* ever been guilty of that :-

      -me

      --
      Love many, trust a few, do harm to none.
    60. Re:Cost v Speed by Shanep · · Score: 2

      The idea that all this is on DRAM is staggering.

      I remember when AltaVista (back in 1996) was boasting that they had 1GB of RAM for their search engine. :)

      But RAM was so cheap up until recently and Google uses so many servers, that I think it probably would be cheaper for them to just work out of RAM. No disk or LAN medium can match RAM for access time, transfer rate and life span and these things are probably most important to Google.

      Trying to have extremely fast disk sub systems in each server in the Google farms would probably incur very high expense space, yeilding much more space than required and much slower space of that which is actually used.

      I don't think this comes down to the typical MB/$ comparison between disks and RAM because Google might only have a gig or so in each server, with lots of servers.

      If you're comparing a gig of really fast memory between RAM and disk, it is easy to see which is cheaper. A gig of RAM would have cost me a few months ago in Sydney ~$300.

      Whereas a gig of the fastest disk I could possibly get might cost me tens of thousands for a load of 15k RPM SCSI disks and a few 64bit PCI hardware RAID-0 cards so that I could only get a meazly 528MB/s transfer rate out of, probably half that of the RAM speed and access times for the RAM would be astonishingly faster than any disk, resulting also in many hundreds of gigs that will probably mostly not be required. Far too expensive, far too ineffective. Google needs fast access times and transfer rates the most, but the fastest of SCSI systems will have their transfer rates killed by zillions of very poor access times. Random access does'nt hurt transfer rates with RAM, the way it does for disk.

      In the end, these machines would probably just end up being configured to each serve what they could cache with RAM so as to keep up with the demand, so why not just boot all these machines off a little flash disk and then just work the engine out of RAM?

      This does'nt just come down to needing RAM or disks that can transfer as fast as the network interfaces, since this is not a simple cache or file server. These servers need to search through their whole index a fast as possible, and doing this in RAM at super high speeds is going to be much more economical in RAM than disk. I doubt Google could even be feasible working out of disks.

      --
      War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
  3. So that's how they do it... by svara · · Score: 0, Offtopic

    I was always wondering how google could mirror almost the entire internet and server millions of hits, I mean, it would need super super super fast storage... DRAM is at least a step in that direction... They must have a fsckin LOT of it tho :) A few TB...

  4. Additionally by Phosphor3k · · Score: 4, Insightful

    How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.

    1. Re:Additionally by LWolenczak · · Score: 2

      I don't think I have ever seen DRAM fail, but I sure have seen my share of both ide and scsi drives die.

    2. Re:Additionally by sammy+baby · · Score: 2

      Exactly what I was going to say. DRAM has the "no moving parts thing" on its side, which is a pretty powerful bennie, if you ask me.

    3. Re:Additionally by LWolenczak · · Score: 2

      We had a scsi drive that died due to it's circuit board going south.....

    4. Re:Additionally by VAXman · · Score: 4, Informative

      DRAM fails all the time. In fact, DRAM is almost certainly responsible for more data corruption than disks are. DRAM gets SBE's all the time, but while when disks fail, they tend to go completely down and don't return corrupt data (which is preferably, IMHO). Of course, DRAM with ECC is significantly more reliable (and also more expensive).

    5. Re:Additionally by Tower · · Score: 1

      Most HD board failures are due to one of two things (or sometimes both):
      1) The components have been shocked (ESD) and weakened.
      2) The heat from the drive is not being properly removed, and the components get overly stressed.

      --
      "It's tough to be bilingual when you get hit in the head."
    6. Re:Additionally by darkwhite · · Score: 3, Insightful

      Very often. And the problem is, unlike hard drives, which will try their best not to return the data if they have a hint that it's corrupted (meta-data, checksums, etc.), DRAM will be more than happy to return the incorrect data, which then might get written to disk. Some of the errors I've seen due to corrupt DRAM are pretty amusing.

      --

      [an error occurred while processing this directive]
    7. Re:Additionally by Spoing · · Score: 2, Informative
      RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.

      When's the last time you checked your RAM? I get about 1 bad module for every 2 machines. Defects usually show up on the initial test, though some don't show up for a few years.

      Don't believe me? Try it yourself; Memtest86. I suggest running one full test (can take days) when you first build a machine, and when you run into odd problems that you can't figure out. The default tests are good, but I've had times where it did miss problems.

      --
      A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
    8. Re:Additionally by Anonymous Coward · · Score: 0

      Your .sig lacks insight.

    9. Re:Additionally by haruharaharu · · Score: 2

      RAM is a mechanical device

      Ram is an electronic device. It has no mechanical parts, save for the junction between it and the motherboard.

      --
      Reboot macht Frei.
    10. Re:Additionally by Hal-9001 · · Score: 3, Informative
      RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.
      RAM is not mechanical, it's capacitive, i.e. it operates by storing charge. One of the advantages of semiconductor, or solid-state, electronics over pre-transistor electromechanical relays and vacuum tubes is that they require no moving parts, making them more rugged and reliable.
      Defects usually show up on the initial test, though some don't show up for a few years.
      A curious thing about solid-state electronics is that a large number of parts fail initially, then the failure rate is constant for several years, and then the failure rate increases again. This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days. Contrast this with mechanical failure, which continually increases with time.
      --
      "It take 9 months to bear a child, no matter how many women you assign to the job."
    11. Re:Additionally by Blind+Lemon · · Score: 2, Interesting
      With hard disks you have things like RAID to protect against disk failure. No such thing with RAM. Sure, you can get protection from a bit going bad, but not for loosing a chip.

      The company I work for makes computers with a lot of RAM and so we've been researching how to survive a RAM chip failure, but as far as I know no system implements such a technology.

    12. Re:Additionally by roguerez · · Score: 2
      This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days

      This makes no sense. A long warrenty period makes a product sell better. When 99.9% of parts that are going to fail do it in 30 days, it's in the interest of the manufacturer to either have no warrenty at all or a very short one (to prevent claims), or one that is very long, like 10 years or lifetime. After the first 30 days, hardly anything is going to break, so it would be stupid not to prolong the warrenty period. This can be done essentially 'free'. And I've seen RAM that have a lifetime guarantee.

    13. Re:Additionally by lkaos · · Score: 2

      What?

      RAM is solid state. It is simple a circuit board with a couple of IC modules. There are absolutely no moving parts.

      The reason RAM goes bad is chiefly from operating temperatures and poor construction (mostly impurities in the air).

      There are absolutely no moving parts in RAM though. That is just silly to even suggest :)

      In fact, the only real moving parts in most PC's are the storage devices and fans...

      --
      int func(int a);
      func((b += 3, b));
    14. Re:Additionally by Defiler · · Score: 3, Interesting

      IBM sells this technology. They call it ChipKill.
      Perhaps this is what your company is looking for:
      ChipKill

    15. Re:Additionally by Chmarr · · Score: 3, Informative
      Ram has both an electronic component, and mechanical. Try this experiment: Take the RAM out of your computer and throw it at your workmate/housemate/mum. He or she will say 'Ow!', and it's not because he or she was hit by electrons!

      RAM heats up as it's used, metal expands, the Chips on that little PCB stretch slightly, joints weaken with each power cycle, sometimes they fragment. The same thing with the connectors to the motherboard.

      Telstra, in Australia, was having a hellish time with certain Cisco routers as the RAM heating up would eventually work it's way out of the socket, crashing the router!

    16. Re:Additionally by SilentChris · · Score: 3, Insightful

      I've seen a lot of "logic" arguments to this post, but I think people are missing a sort of obvious one: size. If you had enough RAM as an average hard drive (say, 20 gigs) I'm sure that at least *one* piece would be faulty. You're comparing, in a best-case server scenario, a gig of RAM vs. a 80-gig hard drive. I think if the numbers were even it'd be a "fairer" fight.

    17. Re:Additionally by Anonymous Coward · · Score: 0

      Mod parent up! A very good read, and educational for those of us with less exp in these areas. Good theoretical stuff too - and all that from a marketing document!

      Damn!

      Cheers!

    18. Re:Additionally by dstone · · Score: 2

      Yes, you should compare 80 gigs of HD versus 80 gigs of DRAM. First of all, you'll usually detect any DRAM faults upon your first powerup test (while it's still under warranty and, more importantly, no data has been trusted to it yet). Okay, so down the road now, DRAM really isn't very sensitive to wear-and-tear. It is, but not nearly to the degree of stepping motors, spinning platters, and crashing heads that need cooling and lubrication. And consider this benefit... if a fault is detected on one chip of an 80 gig cluster of DRAM, you can swap one chip, not the whole 80 gigs. (Either way, it'll likely require a power-cycle and data restore from backup though.)

    19. Re:Additionally by alex_ant · · Score: 2, Informative

      I agree that DRAM is certainly more reliable than hard disk storage, but I should point out that a computer's power-up "memory test" is more like a "memory count" than anything. The machine says it's "testing" the memory, but it's basically paging through it to make sure it's all there. It will miss all but the most severe memory problems.

      I speak from experience, as the owner of several past flaky PCs that had bad RAM, and the owner of an SGI Indigo2, which had a SIMM that would get parity errors every now and then that the POST (or whatever it's called on SGIs) would fail to detect. If you really want to test the memory, you're going to have to run some real memory-test software, which typically takes a loooong time to run (hours or days). That's because a great number of memory errors happen only slightly too frequently to be called flukes.

    20. Re:Additionally by Anonymous Coward · · Score: 0

      Does your sig always go with your comment that well? :-)

    21. Re:Additionally by Hal-9001 · · Score: 2

      The OEM's to whom CPUs and DRAM are usually sold to know that 99.9% of parts are going to fail in 30 days--there's no point in trying to sway them with a longer warranty...

      --
      "It take 9 months to bear a child, no matter how many women you assign to the job."
    22. Re:Additionally by Hal-9001 · · Score: 2

      CPUs are the exception in this case. In general, you don't want solid-state electronics running so hot that fatigue due to thermal expansion is a factor.

      P.S. Nice attempt to make the RAM a moving part, but that doesn't mean it has moving parts... :-p

      --
      "It take 9 months to bear a child, no matter how many women you assign to the job."
    23. Re:Additionally by Anonymous Coward · · Score: 0

      Wouldn't you get pretty far simply by storing each bit of an ECC unit in a different chip?

    24. Re:Additionally by Anonymous Coward · · Score: 0

      Telstra, in Australia, was having a hellish time with certain Cisco routers as the RAM heating up would eventually work it's way out of the socket, crashing the router!

      Or, more likely, Telstra had just misconfigured the routers, because they are all fucking idiots.

    25. Re:Additionally by Chmarr · · Score: 2

      Oh... don't misunderstand me. I'm not trying to pretend that RAM Is in any way a mechanical device like, say, a fan or harddisk is. I'm only saying that to say that RAM does not suffer from mechanical problems is incorrect... albeit doing it in a funny-ha-ha kinda way :)

  5. RAM vs. HDD by hitchhacker · · Score: 2, Redundant

    If google has something like 10,000 linux PC's, I would definately think that using RAM and a ramdisk for the rootpartition would be cheaper than putting a hard drive in every PC. I would imagine that the hard drives would be the first to go if something failed.
    Obviously, if they used DRAM for their HUGE central databases, it would not be a cheaper solution.
    But, I'm talking out of my ass, because I don't know how their datacenter works.. anyone anyone?

    -metric

    1. Re:RAM vs. HDD by no1here · · Score: 1

      Google uses many off-the-shelf and used PC's. Therefore, it would leave one to believe that all the computers would come with a hard drive. Maybe they even get a discount for donating or sending back hard drives.

    2. Re:RAM vs. HDD by Anonymous Coward · · Score: 3, Interesting

      actually google uses freebsd on their PCs

    3. Re:RAM vs. HDD by Anonymous Coward · · Score: 0
      "Talking out of your a**"...


      mmm... That explains why your voice has changed but your breath smells the same!


      (Sorry, couldn't resist! ;-)

    4. Re:RAM vs. HDD by e40 · · Score: 1

      According to Peter Norvig, a bigwig there, they use Linux. I just saw a talk by him last week were he said this.

    5. Re:RAM vs. HDD by Anonymous Coward · · Score: 0

      I'm sure they use HDs. Just that HDs are for backups, or at least facilitating them.

    6. Re:RAM vs. HDD by Anonymous Coward · · Score: 0

      I heard they use DOS 4.0 and 2400 baud cassette tapes for storage.

    7. Re:RAM vs. HDD by Anonymous+DWord · · Score: 2

      Yup. It's a pretty-tweaked version of RedHat.

      --
      "If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
  6. Speed saves by coreman · · Score: 3, Insightful

    They make their money on hits served so speed is far more cost effective than cost of storage medium. If they can speed up serviing hits, they're ahead of the game.

    1. Re:Speed saves by Anonymous Coward · · Score: 0

      If they can speed up serviing hits, they're ahead of the game.
      Yeah, since if I get my hits in 0.6 seconds instead of 0.2, I eventually realise that altavista will be three times worth my time?

      Uh, no.

      Oh, and another thing. I once made google do a 30-second search by writing a script to look for the longest string consisting entirely of stop-words in my etext collection. It was funny. (Got my results though! And precisely 4 other documents in which the string occured unrelated to my original document).

    2. Re:Speed saves by Anonymous Coward · · Score: 0

      Schmidt: Half of Google's revenue comes from selling text-based ads that are placed near search results and are related to the topic of the search. Another half of its revenues come from licensing its search technology to companies like Yahoo!.

      I'm just curious, how many halfs of Google's revenues are there?

    3. Re:Speed saves by Anonymous Coward · · Score: 0

      Looks like two.

    4. Re:Speed saves by coreman · · Score: 1

      Do the math. If they can service your request in a third the time, they can service three times as many hits with the same hardware. We're talking maximum load, not your convenience

    5. Re:Speed saves by Anonymous Coward · · Score: 0

      your "math", coreman, implies that if they could service each request in 0 seconds, they'd suddenly more than quadruple the number of searches people would be doing. I'm saying that wouldn't happen, since they are currently servicing EVERY request somebody makes, and NOBODY is refraining from using their service because 2.3 seconds is just too damn long. Therefore, whether it takes 2.3 seconds to service the average request, 3.8, 4.2, or 0, google gets just as many hits.

    6. Re:Speed saves by coreman · · Score: 1

      And you're making the assumption that there isn't a queuing latency in your times as well. If it truely is a server farm then there is some overhead in getting the request serviced but that is included in the overhead of .2 or .6 seconds and their costs (remember we were talking DRAM versus HD) are reduced by being able to do the same job with less hardware (plus some percentage for hot spares and peak usage)

    7. Re:Speed saves by Anonymous Coward · · Score: 0

      You are a fucking dumb-ass. It's not how long it takes to service an individual request, it's how much hardware do you need to service all the requests. Faster hardware == less hardware needed == less money.

      Sheesh, dude, it's not that complicated.

    8. Re:Speed saves by Anonymous Coward · · Score: 0

      haha, thank you. i'm so stupid. (If the poster had said "can service the requests with one third of the hardware" instead of "can service three times the requests with the same hardware" i obviously would have understood him or her much better.)
      thanks for calling me a fucking dumbass.

  7. From the article: Why DRAM is so fast by yerricde · · Score: 5, Informative

    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

    When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:

    Schmidt: "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks -- which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast."

    With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.

    --
    Will I retire or break 10K?
    1. Re:From the article: Why DRAM is so fast by jackb_guppy · · Score: 4, Interesting

      A simpler way of saying this:

      Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.

      -or-

      Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.

      In both cases we are talking about 1 million Hits per X time.

      In case 1 - it costs a port on master switch and $100,000 for the machine.

      In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.

      Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.

    2. Re:From the article: Why DRAM is so fast by Anonymous Coward · · Score: 0

      > Case 1 20% Cheaper then case 2.

      First, learn math, 'than' learn english...

    3. Re:From the article: Why DRAM is so fast by Anonymous Coward · · Score: 0

      Score +1 funny: Sorry jackb, your post was good, but the English and math was a little fuzzy. Heh.

    4. Re:From the article: Why DRAM is so fast by Anonymous Coward · · Score: 0

      ahahahahaah!!!!!
      +1 funny!

    5. Re:From the article: Why DRAM is so fast by Anonymous Coward · · Score: 0

      His math is correct, the way he state it is incorrect. 100,000 _is_ 20% of 500,000. What he should have said is...that the costs of option 1 is 20% of those in option 2.

      The than thing is an easy thing to miss...but wrong none-the-less.

      Cheers!

    6. Re:From the article: Why DRAM is so fast by dillon_rinker · · Score: 2

      Case 1 20% Cheaper then case

      MATH ERROR! MATH ERROR!

      "A is X% cheaper than B" in English translates to:

      A = B - B * X / 100

      Or, take 20% of $500,000, subtract it from 500,000, and that's something that's 20% cheaper.

      Your statement would have been more accurate as follows:

      "Case 1 80% Cheaper then case 2" [sic]

      It would have had much more impact to say this:

      "Case 2 is 400% more expensive than case 1."

  8. I've always wondered by Lord+Hugh+Toppingham · · Score: 2
    Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?

    AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.

    1. Re:I've always wondered by uncl_bob · · Score: 1, Informative

      Actually, not that much of the operating system is pulled from the harddrive once the system is up. Maybe some special parts of windows like IE and other things would benefit from beeing in RAM, but not the whole C:\windows-tree.

    2. Re:I've always wondered by propstoalldeadhomiez · · Score: 1, Informative

      There's an option in Win2k to not swap portions of the kernel out. If you have 128 MB of RAM or more, it's probably a good idea, too. The whole thing doesn't need to be in memory the whole time, just what you use the most.

      --

      Jack Buck (1924-2002)
      Darryl Kile (1968-2002)
    3. Re:I've always wondered by MarkusQ · · Score: 2, Informative
      Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?

      AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.

      You can do it in Linux (and probably in Windows too, though I'm not sure how)--but there generally isn't a reason to. The VM/RD cycle swings back and forth over the years, but at present the PC world seems to be running best with 2::1 VM ratio (using a chunk of HD about twice your RAM size to simulate more RAM) although part of this is that RAM is being used up by smart caching of disk. This holds for Windows, Linux, and (IIRC) Open BSD.

      So, the short answer is: you could do it, but it would likely slow you down overall.

      -- MarkusQ

    4. Re:I've always wondered by bargle · · Score: 1

      There's no good reason to. Modern operating systems cache recently accessed files in memory. True, there might be some point in precaching files (like with a ram disk), but also, with a ram disk, you can't eject files that never get touched out of your valuable memory space.

      That's why installing more memory makes a computer "faster" -- even if your application doesn't use it, it will still be used by the operating system to cache files.

      --
      Would you shut up already?
    5. Re:I've always wondered by Anonymous Coward · · Score: 0

      This is quite possible with Linux. I run a machine that boots from CD, and then operates completely from a ramdrive - no hard disk at all.

    6. Re:I've always wondered by Anonymous Coward · · Score: 0


      The shorter answer is: you read your OS once, then it is in RAM. Why keep it in an extra ramdrive?

    7. Re:I've always wondered by Cylix · · Score: 3, Informative

      I looked into using a virtual ram disk for a section of data that was being accessed quite frequently. Of course I did some reading and it turned out not to be terribly necessary.

      The more memory present in the system, the more memory the linux kernel dedicates to caching. Thus commonly read files are in memory and have incredibly fast reads. This is performed auto-magically without the user even being aware of it.

      Of course no two situations are exact and you may have a purpose for dedicating a ram disk to something. There are instances where you may want a fast read/response time, but the file isn't commonly used. Such as the data for a squid proxy cache. A ram disk in such a situation would be entirely helpful.

      --
      "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
    8. Re:I've always wondered by jc42 · · Score: 2, Interesting

      Huh? Go to handhelds.org and look at the specs for the various linux handhelds. Few if any of them have hard disks; everything is run out of memory. This doesn't seem to have been much of a problem with linux (or any of the unix clones). A "ramdisk" isn't exactly a new concept in the unix environment.

      In fact, this sort of trick was exactly why the unix "block device" abstraction was invented more than a quarter century ago. It allows you to have a file system on anything that can store data in addressable chunks called "blocks". Memory works just fine for this.

      An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa). This causes most apps' temp files to be in main memory, and eliminates rotational delays for these files.

      There's no real problem with mapping the entire file system to memory.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    9. Re:I've always wondered by Anonymous Coward · · Score: 0

      You can do it with OpenBSD, you obviously don't know how.

    10. Re:I've always wondered by AA0 · · Score: 1

      I think they were going to do this, but then they realized that everyone with AOL would fill up that 512 megs soon as they connected.

    11. Re:I've always wondered by Halvard · · Score: 1

      Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?

      In fact, many if not most minimalistic Linux distro's do this. Specifically, Linux Router Project do this. I use it extensively. The kernel boots and the file system decompresses to a RAM drive. It's very fast.

    12. Re:I've always wondered by tshak · · Score: 2

      Caching the entire Kernal and commonly used DLL's is supported in WinXP (Pro, not sure about Home). I believe there is undocumented support in Win2K but I have not verified this. A friend of mine built a machine with 512MB of RAM and put XP on it and enabled this "cache" feature. Although the boot time was a little (barely noticeable) slower, the load time of apps and common tasks was incredible - almost as if you were using a solid-state device (a PDA, for example).

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    13. Re:I've always wondered by byran+lei · · Score: 1

      >Why windows does not run off a ramdrive. I mean, modern PCs all have
      >at least 512MB ram, why not load up Windows once, and then never
      >access the disk drive again?
      >
      >AFAIK Linux and Open BSD cannot do this either. It seems amazing to me
      >that people have missed this idea.
      >
      You can run Linux off a ramdisk. The Partition Image backup software for Linux does this when you boot it up from either floppy or cdr/cdrw. Most people just choose not to. The only computer that quite frankly needed to be run from a ramdisk was the Amiga and that was because Amiga floppy and hardrives where so expensive and fucking slow.

    14. Re:I've always wondered by haruharaharu · · Score: 2

      An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa).

      This was because SunOS had a dog-slow filesystem; even today, /tmp is usually backed by ram. Linux (and probably BSD) has a fast enough filesystem that this isn't an issue

      --
      Reboot macht Frei.
    15. Re:I've always wondered by Anonymous Coward · · Score: 0

      In Windows XP I know that you can load the whole kernel into RAM as long as you have at least 512MB.

    16. Re:I've always wondered by Anonymous Coward · · Score: 0

      Hey guess what? The Amiga has been doing this for 10 years. It has a recoverable ram drive, when you reset the machine it boots off the ram drive.
      Another example of being ahead of your time that just doesn't pay. Hell, the Commodore 64 can do that with the ram cartridge!

    17. Re:I've always wondered by Anonymous Coward · · Score: 0

      The Amiga had a recoverable Ram Drive device, allowing you to reboot from RAM rather than (floppy) disk. That's something I'd like to see on windows :-), and a similar thing might be used to reduce downtime if your (or Google's, getting back on-topic) server goes belly-up. Assuming you have a few hundred MB of RAM to spare.

    18. Re:I've always wondered by Cramer · · Score: 1

      s/Win2k/NT/ (That option has been around for a long time.)

      The DisablePagingExecutive key doesn't do a whole lot. Most of the kernel is still paged out.

    19. Re:I've always wondered by Lord+Hugh+Toppingham · · Score: 0
      you obviously don't know how.


      No OBVIOUSLY I don't... Perhaps that is why I used the acronym AFAIK. Look it up. And if you want to be helpful instead of simply insulting, post the details of how it is done in OpenBSD.

    20. Re:I've always wondered by Bert64 · · Score: 1

      AmigaOS Could do this since 2.0 atleast, possibly even 1.3, using a device called RAD:, a reset resident ramdisk.
      I had my amigaos configured to boot from hd, make a RAD: disk with a higher boot priority than the hd, copy the os over there, and replace the startup script, then reset.. and up it would come from the ramdisk until i powered down.
      It was also possible to assign ENVARC: (where saved system settings are stored) to the hd, while keeping ENV: (system settings currently in use) in ram, as it is by default

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    21. Re:I've always wondered by MoneyT · · Score: 1

      Computers used to store their entire OS in memory (see comodore 64). But with modern day OSes, changes occur to the system files too often for it to work in the old system.

      However, many systems now can run from a RAM disk. Most mini linux distro's do. The mac OS has an option to create a RAM disk, and then have it save to the HD when you shut down, (so theoreticaly you could start from the RAM disk as I believe the first thing it does it read the file back to RAM.

      --
      T Money
      World Domination with a plastic spoon since 1984
    22. Re:I've always wondered by Anonymous Coward · · Score: 0

      The Win2K option is to keep the KERNEL in memory, there is no option for DLLs (as far as I know). Windows will keep DLLs around for a couple of minutes, and that's default behaviour. But you can't force it.

      http://www.winguides.com/registry/display.php/39 9/

    23. Re:I've always wondered by psamuels · · Score: 2
      This was because SunOS had a dog-slow filesystem; even today, /tmp is usually backed by ram. Linux (and probably BSD) has a fast enough filesystem that this isn't an issue

      There's also the small matter of write-back caching - any modern OS should cache writes aggressively (or at least should have the option), such that short-lived temp files (you know, the ones whose speed matters most) usually never reach the platter before being deleted anyway.

      --
      "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
    24. Re:I've always wondered by Ginsu2000 · · Score: 1

      Yep, I used this same trick on my 1.2 Amiga, with a Workbench 1.3 disk and 1 MB of RAM! I could copy the Workbench to RAD and could reboot as quick as flash. It was *wonderful*.
      Infact, I used to load Carrier Command to RAD, because it crashed so often, and I wanted to exit it before it died, and reload it quickly. Or maybe I just liked seeing it load *SO FAST*. Anyone know if RAD is supported with WinUAE?

    25. Re:I've always wondered by jafac · · Score: 2

      There IS a registry hack somewhere for NT/2k that supposedly "keeps the OS in RAM for faster performance"

      It actually works.

      http://www.winguides.com/registry/display.php/39 9/

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    26. Re:I've always wondered by Anonymous Coward · · Score: 0

      "Go to handhelds.org and look at the specs for the various linux handhelds. Few if any of them have hard disks; everything is run out of memory."

      True, but don't compare that to a ram drive. They are structured differently and are different than running an os off rom.

      Ramdisks are something normally created dynamically from available ram. Although there are devices made that slide into a drive bay and they look and act exactly like a scsi or ide device. But they are actually a circuit board and row's of simm's/dimm's. I haven't see any recently, but used them a couple years ago. It even takes the overhead of managing the ramdisk dynamically.

  9. Scary! by Anonymous Coward · · Score: 4, Insightful

    Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
    Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.

    By the way, this guy can't speak for beans.
    The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"

    1. Re:Scary! by Phosphor3k · · Score: 5, Funny

      The system goes on-line on August 4th, 1997. Human decisions are removed from strategic searching. Google begins to learn, at a geometric rate. It becomes self-aware at 2:14 am, eastern time, August 29th. In a panic, they try to pull the plug.

      Google fights back.

    2. Re:Scary! by Fissure_FS2 · · Score: 2, Funny

      Just my luck. Our favorite search engine takes over the world on my birthday.

      I can imagine it now: just as I am about to blow out the candles, a giant DRAM chip bursts out of the cake and says, "I am Google. I am here to protect you. I am here to protect you from the terrible secret of space... er, the web."

      --
      My life's goal is to get a score of +3!
    3. Re:Scary! by Anonymous Coward · · Score: 0
    4. Re:Scary! by Mr+Z · · Score: 2, Informative

      And mine, too. Actually, in case you didn't recognize it, the original poster's scenario comes directly from the Terminator series. Skynet became sentient on August 29th, 1997. (Which was, incidently, my 22nd birthday.)

      --Joe
    5. Re:Scary! by LafinJack · · Score: 1

      The REALLY scary part is that you know Google could do it.

      --
      we are building a religion
      a limited edition
      we are now accepting callers
      for these pendant key chains
    6. Re:Scary! by madcow_ucsb · · Score: 1

      Don't feel too bad...the mayan calander is gonna end on my 32nd birthday (12/23/2012). That could be an interesting party.... :)

    7. Re:Scary! by Anonymous Coward · · Score: 0

      Google became self aware on January 8, 2001. Don't be afraid, she's a sweetheart...

    8. Re:Scary! by squant0 · · Score: 1

      And we get "The Matrix" a little eairlier than we expected....
      Do you really think DRAM could produce solar cells / nuclear power... if they could, we should be careful... I mean, if it is millions of searches a second faster than a hard drive, who knows?!?
      :-P

  10. Once again a simplistic view by damieng · · Score: 3, Informative

    I often see comments from this from people who have little experience in business.

    What you pay for the initial product is not what it "costs" in the long-term. Businesses have a term for this called TCO or Total Cost of Ownership. It includes all the other time and materials needed to keep the item in use.

    I would imagine in this case that the simple reason is that why DRAM is more expensive to purchase it is a *lot* less expensive to run, the primary cost being power.

    Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.

    --
    [)amien
    1. Re:Once again a simplistic view by NNKK · · Score: 2, Insightful

      Stack reliability, as someone else mentioned, on top of power and speed savings.

      Personaly I seriously doubt that all or even close to all of the stuff google stores is stored in DRAM, it's more likely they'd keep newer data and high-access data in DRAM, and older stuff gets archived to disk, avalible for recall later, but slower.

    2. Re:Once again a simplistic view by Alomex · · Score: 2

      Personaly I seriously doubt that all or even close to all the stuff google stores is stores in DRAM

      You better believe it. Altavista already did that a long time ago. Hotbot (inktomi) had a similar all-in-memory scheme. Since Google is faster than those two, all the more reason to believe that the data is in DRAM (although surely they have backups in HDs and tape, but that is a different story).

    3. Re:Once again a simplistic view by Anonymous Coward · · Score: 0
      > Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.

      Last time I saw an article about google it said google uses off the shelf 8GB IDE drives (maybe they use bigger drives now, but still mid to low end).

      I can't find the link but I'm sure someone will post in hopes of some sweet karma.

    4. Re:Once again a simplistic view by spudnic · · Score: 1

      What I would do if I where going to implement a system like this would be to store all the pages that my spider pulled in to disk, then create an index of that entirely in DRAM. If for some reason you lose the contents of DRAM it can start reindexing, or copy the existing index from disk.

      .

      --
      load "linux",8,1
    5. Re:Once again a simplistic view by Anonymous Coward · · Score: 0

      I often see comments from this from people who have little experience in business.

      Most of these people are probably the ones who say "A college degree is worthless. I don't need to know all that stuff that has nothing to do with my career. I can program in 28 different languages.

    6. Re:Once again a simplistic view by Anonymous Coward · · Score: 0

      > then create an index of that entirely in DRAM

      Think again. To be usable (*), your index will be bigger than your data. For a lot of purpose, you can get better results with very weak indexing schemes if the data cost nothing to be retreived.

      (*) Try to understand how you are going to execute requests like ("foo" near "bar buzz")

    7. Re:Once again a simplistic view by cc.Scotty · · Score: 1

      Well, I'm no hardware engineer but here are my additional observations of the advantage of DRAM, or less disks in your infrastructure. In my companies colocation facility we have some pretty hefty servers, but nowhere near what I would imagine Google would need. Our db servers have SAN disk storage which requires us to have redundant fiber channels to disk shelf after disk shelf filled with 36GB 15k drives. Of course each of our servers, the primary and the hot standby, have redundant power supplies and are always on. Both of the fiber channel switches have redundant power supplies, and of course, each of the drive shelves have redundant power supplies. Now to accomodate all the power draw we have to pay an extra several thousand a month because we have exceeded our base level of service with our co-loc contract. Believe me, they aren't just passing their cost of electricity on to us. So now we have a recurring monthly cost to keep a whole lotta power supplies running. Now onto the drives. I seem to recall RAID 5 is our usual preference, but clearly we couldn't perform like Google, so Google would likely choose RAID 0/1, which is the most expensive choice for RAID. I could easily see them spending $500k on drives.

      Now Googles advantage, from my point of view, is that they don't really need redudancy at all. After all, they are somewhat redundant themselves. If they loose some data due to a failure, they can just run a crawl from another of their systems to reacquire the data they lost. So nix the redundant disk levels and all of the electricity that goes with it. We could probably nix the redudnant fiber too. Now with using scads of DRAM, I would imaging local disks running in a server just to initialize they system and data base in memory on boot up. So now we can nix the fiber all together and probably standby servers too.

      Now two things occur to me:
      1) did Google purhcase and use all of the infrastructure I just described before they came to the conclusion that DRAM was cheaper?
      2) could use a RAID 5 like redunancy of data among the content of their servers so they can tolerate a failure of a server without downtime.

      Of course, all of those green blinking lights at our co-loc look cool. We are the envy of our neighbors at the facility. Maybe it is worth it?

    8. Re:Once again a simplistic view by Anonymous Coward · · Score: 0

      I often see comments from this from people who have little experience in business.

      I often see this gobbeldy-gook from MBAs who believe business is a natural law.

      Please, go beat your head against a brick - the world dosnt need your pseudo-science... and take those moron Economists with you also.

    9. Re:Once again a simplistic view by Anonymous Coward · · Score: 0

      You mean 80GB, not 8GB. It was two 80GB drives per machine (~8000 Linux boxes), on two independent IDE controllers.

      Yes, IDE.

      I can't find the story using a google search either, but I'm certain that's correct, as I was quite surprised by it at the time.

    10. Re:Once again a simplistic view by Anonymous Coward · · Score: 0

      Certainly DRAM uses less power than a mechanical hard drive.

  11. The key to it being cheaper is.... by rayd75 · · Score: 3, Insightful

    That it can handle many clients with little latency... You'd have to duplicate the data across a huge number of disks to provide similar response time to clients. Sure, if you were the only client, you couldn't tell the difference but with thousands upon thousands of clients all seeking data that would be stored in different locations on a disk things would quickly grind to a halt. Because so much unrelated data is being requested, seek time is the key. Sure, memory is more expensive per meg but its ability to serve so many more clients makes it less expensive overall.

    1. Re:The key to it being cheaper is.... by Anonymous Coward · · Score: 0

      I realize that RAM is a few orders of magnitude faster than disk, but I think that disk drives are not being given enough credit. Methods have been devised to spread information across disks in such a manner as to not replicate data across all disks and to distribute the the load across the disks at the same time. It is called hashing. Granted that I have yet to see a RDBMS system (or any other searchable system) that distributes the load evenly across all of the disks that use this methode, there are still some pretty darn good ones out there. For example I was looking at some stats that Walmart put out for the system that they use for there "Crystal Ball" system and they said that the system recieved up to 8.4 million updates per minute in the background while doing its main processing in the forground. I don't really have a good grasp of what the system can really handle, but it sure is a heck of a lot. (Now this is a system that stores dozens of terrabytes across thousands of disks.)

  12. Imperial MegaRam? by Ben+Jackson · · Score: 4, Interesting
    They may be referring to Imperial Technology's MegaRam solid state disks (SSDs). They claim about 36,000 IO/sec. Compare that with 80-120 IO/sec on a typical SCSI drive. I'm pretty sure that eBay is using them.

    I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.

    1. Re:Imperial MegaRam? by Anonymous Coward · · Score: 1, Funny

      Under Solaris, if the system time is way high, then you need to re-think your application's architecture. Under Linux, though, if the system time is way high... well, what's new? ;-)

    2. Re:Imperial MegaRam? by ottffssent · · Score: 2

      100-120 IO/sec? These SCSI drives you're talking about are several years old.

      Storagereview reports 120 IO/sec for Western Digital's top IDE drive, the WD1200BB. See Storage Review. A top-end SCSI drive such as the Seagate X15-36lp performs on the order of 360 IO/sec. See Storage Review.

      For the interested, the X15 runs about $14/G, and you would need about 100 drives to equal the IO/sec of the RAM drive. That's $1400/G; minimum $60,000, and about 700W power consumption and about 2kW total when you add cooling to that. High quality PC2100 DDR is in the $550-600 range per gig, and about 10 watts after cooling.

  13. Fewer servers needed by michaelmalak · · Score: 5, Interesting
    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
    Google's Eric Schmidt probably means that fewer replicated servers are needed. If we take his stat of 200,000x speedup at face value, then you would need 200,000 times as many hard-drive-based servers as DRAM-based servers. There are many other factors involved such as communication delays and scalability, but you get the idea.

    This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.

    1. Re:Fewer servers needed by The+Smith · · Score: 1

      In fact, the PIII and successors have a 36-bit addressing extension, allowing them to use up to 64GB of physical memory. You can still only access 4GB at a time, but that's all dealt with in the OS kernel.

    2. Re:Fewer servers needed by ErikZ · · Score: 2


      I want to know HOW they are doing this. Are they using PIIIs with 64GB of memory?

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
    3. Re:Fewer servers needed by The+Smith · · Score: 2, Informative

      Yes, but it's all rather confusing. Read this thread in the Linux kernel mailing list if you're really interested. (WARNING: You won't understand any of it unless you know how the x86 virtual memory mechanism works.)

    4. Re:Fewer servers needed by eggboard · · Score: 1

      Thank you. I can't believe with all the brilliant Slashdot minds it took this long for someone to post the sensible analysis!

      It's all about speed versus hardware at Google: they will always add hardware to keep the speed of queries consistent. If you can add less hardware (possibly 10% of what you'd need with a hard disk dependency) by relying on RAM, you save a bundle and keep performance at a reasonable rate.

      Also, depending on how linear the curve is, you throw another 1Gb in 100 machines instead of buying another 1,000 machines.

      --
      Freelance tech journalist for the Economist, MIT Technology Review, Macworld, and others
    5. Re:Fewer servers needed by Anonymous Coward · · Score: 0

      Something else to be considered is that RAM is less prone to failure than hard drives. There will be less overhead with diagnosing and replacing bad RAM than there would be with equivelant hard drives.

      Remember that since hard drives have moving parts, they do have a life span. If you're lucky, you'll get 10+ years out of your hard drive, but Google would certainly have to deal with the drives that die early given how many of them they would need.

  14. I believe it... by josh+crawley · · Score: 3, Informative

    At my dad's work, they use a type of chip, but it's not dram. They use E^2prom. True, you do take a performance hit, but they have 10 "gig ethernet ports" on the thing. The last price quote I got was $12000 for a terabyte of this stuff. Don't forget to compare price/performance ratios to the best chipsets of IDE (or if you're a scsi bigot, SCSI). Pulling random data is very easy for chips, but HD's of ANY speed and quality are still slower.

    Josh Crawley

    1. Re:I believe it... by Anonymous Coward · · Score: 0

      or if you're a scsi bigot, SCSI

      Well Josh, I wonder if maybe you happen to be a IDE bigot?

      hmmmmmmmm

    2. Re:I believe it... by _Knots · · Score: 1

      Urhm.... I doubt anybody'd be using EEPROM for something as dynamic as web indexing.

      1) It's slow to write and must be erased either byte-at-a-time or block-at-a-time.

      2) Most EE chips can't store data well after 100000 write-erase cycles.

      Given that google updates every other day or so, I highly highly doubt they're using E^2.

      --Knots

      --
      Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
    3. Re:I believe it... by iriles · · Score: 1


      I don't know much about EEPROM but the points you brought up don't seem to make sense. maybe i miss understand.

      1) slow to write shouldn't be a problem they only write new updates to the index once a day (or less).

      2) and 100000 write cycles at once per day means they wouldn't have to change chips for another 270 years.

      I guess what I am saying is that web indexing isn't very dynamic at all.

    4. Re:I believe it... by josh+crawley · · Score: 1

      You damn right I am. UltraWide SCSI looks superb, but any basic card (notice im saying basic, not big bucks high end) worth its salt is the adaptec one. Those things start out at $75. I can get IDE standard on a mobo. MOBO IDE chipsets arent known for thier speed, but they are on there. If there was a consumer-level ultraWide SCSI MOBO's on the market, I'd use UltraWide.

      I use what they provide. No sense on buying extra equipment when the motherboard chipset can do it.

      Josh Crawley
      ps: Im cheap when it comes to building secondary computers. My primary gets all the good stuff.

  15. RAM Disks by buckrogers · · Score: 3, Interesting

    If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.

    You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.

    I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.

    Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!

    --
    -- Never make a general statement.
    1. Re:RAM Disks by epsalon · · Score: 2

      20TB is peanuts for a search engine the size of Google. Google's needs are closer to 500TB, or even a few PB. Don't forget the cached pages and the usenet archive! These stuff should take at least a few PB.

    2. Re:RAM Disks by graxrmelg · · Score: 3, Insightful

      Google doesn't need petabytes of storage. Right now they claim 2 billion Web pages, 700 million Usenet messages, and 330 million images. That's a total of 3 billion things. Let's wildly overestimate their average size as 100K (remember that the Usenet archive doesn't include binaries). The storage space required would be 3e9 * 1e5 = 3e14, or 300 TB.

      It's probably true that 20 TB isn't enough for Google, but it's not true (and won't be for quite a while) that the cached pages and Usenet archive require "a few PB".

    3. Re:RAM Disks by buckrogers · · Score: 2

      Guess what? Google doesn't cache images! And I bet they compress the cached page too.

      So, let's get wild and say that there is 120TB of html pages that we care about... if you compress these pages then they would fit in 10 TB. Still plenty of room on a 20 TB RAM Disk for the index to all these pages.

      And besides, I'm just guessing... They might have 8GB of RAM in every machine, for all I know.

      --
      -- Never make a general statement.
    4. Re:RAM Disks by Anonymous Coward · · Score: 0
      20 TB of storage. This seems sufficient to me for most storage needs.


      Most toy computer storage needs, maybe. My current clients aren't designing any systems with less than 50TB.

    5. Re:RAM Disks by great+throwdini · · Score: 1

      Guess what? Google doesn't cache images!

      Ummm... think again. Granted, they don't seem to cache the original image, but they seem to cache a smaller thumbnail, as I often am able to browse a thumbnail for an image no longer extant on the originating server.

    6. Re:RAM Disks by um...+Lucas · · Score: 2

      It doesn't look like they're storing the image at all... Just the text around the image for the search, and then the results page is actually pulling the image from it's originating server...

    7. Re:RAM Disks by Perdo · · Score: 2

      They don't sell it, they licence it. That is Google's primary source of income.

      --

      If voting were effective, it would be illegal by now.

  16. Five minute rule by NearlyHeadless · · Score: 3, Informative
    The raw cost of DRAM ($/MB) is still much higher, but that is not the complete analysis. Database god Jim Gray's analysis shows that you should keep data in memory if it is going to be accessed every five minutes or less.


    See The Five-Minute Rule, ten years later (Word Doc) or it's HTML-ified Google Cache

  17. price comparison by karmma · · Score: 4, Informative

    Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.

    In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.

    Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.

    1. Re:price comparison by bdolan · · Score: 2, Insightful

      If you have heavily hit database indexes, i.e. google, then you may need 100-1000x fewer machines. The cost of the disks is not the important cost, it is the far fewer number of machines for an equivalent query rate. However, you want to have far more than 2gb of directly addressed ram per machine--in fact at current prices it is probably cost effective to put 100's of gb per machine if you need to keep the query ram based--even if the CPUs are dwarfed in cost by the ram.

      This is one of the reasons that we need 64 bit addressability on commodity IA architecture ASAP -- Ram drives using an IO subsystem adds a huge overhead compared to indexing in arrays and natural data organization as opposed to fixed blocks of byte that have to be retrieved as a unit with 100s++ of instructions and security models in the way of access!

    2. Re:price comparison by darkwhite · · Score: 2

      $250/gig? That's not reasonably priced. I think PC133 DRAM can cost as low as $125/gig in bulk now...

      --

      [an error occurred while processing this directive]
    3. Re:price comparison by Reziac · · Score: 2

      It's gone back up a bit since then, but last December, Star Components (www.star-components.com) was selling PC133 DIMMs at $55/gig. Newer RAM types were somewhat higher, but nowhere near $250/gig.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    4. Re:price comparison by haruharaharu · · Score: 2

      I just bought a Gig of DDR ECC ram for $150 from compsource, so there's a datapoint for you.

      --
      Reboot macht Frei.
    5. Re:price comparison by bataras · · Score: 1

      I work at an online service that uses LOTS of storage space. I can tell you the cheapest *fully loaded* (meaning support, salaries, power etc etc) cost of storage is 40-50$ per gig. You can buy *manageable* storage arrays for ~20$ per gig (lower price and you're doomed to continuous netops problems in production), but after that factor in the other costs to get to about double. Then depreciate it over 4 years.

    6. Re:price comparison by Noehre · · Score: 1

      If you haven't noticed, RAM prices have doubled since late December.

      A 256MB DIMM of PC2100 Registered DDR from Crucial will cost you like $75 now.

    7. Re:price comparison by Sarcasm_Orgasm · · Score: 0

      I bought 2 of those for $96 + shipping in August..however my Geforce3 is considerably less expensive..so my warm fuzzy is now gone

      --
      Special people have long socks, ride short buses, & invent witty sigs.
    8. Re:price comparison by SEE · · Score: 2
      This is one of the reasons that we need 64 bit addressability on commodity IA architecture ASAP

      Sledgehammer is coming. Sledgehammer is coming.

  18. A number of reasons it could be "cheaper"... by AtariDatacenter · · Score: 2

    Maybe he's talking in terms of TCO (total cost of ownership). Over its lifetime, RAM costs less than its hard drive counterpart?

    Another point... as long as you don't store you METADATA 100% in RAM, you can store at least your data (cached web pages) in RAM. What happens if it gets dumped? Simple. Just respider the pages you lost and go on. Small amounts of data loss can be covered.

    Okay. It may sound like I'm talking out of my ass because I am. It is really hard to cover for a statement like that. But lets talk again on the performance angle that has been covered (but with a little more emphasis on RAID disks).

    You *may* be able to get better cost/performance with LOCAL memory (not ram-based drives) than you could with a RAID array. And a raid array could never equal the performance you get with local memory. Of course, local memory could never reach the storage you achieve with a raid array. So these two paths seem to diverge (bulk storage vs speed) when comparing local DRAM to RAID'd disks.

    His statement MAY make sense, but it would have to be put into a larger context. (RAM is better than disk in X circumstances.)

  19. Hard disk is an obsolete technology by DrD8m · · Score: 1, Insightful

    Today new computers have 256 or 512 Mb RAM, that's what we've got 10 years ago (386-486 era), every day RAM gets cheaper and IMHO a spinning disk fails too much and it's too much slow to work with on a overloaded servers. RAM provides us almost instant access to data and doen't fails as a hard disk.
    I hope soon we'll only use some kind of RAM for everything and not a disk.

    1. Re:Hard disk is an obsolete technology by __aaaaxm1522 · · Score: 2

      Look at PDAs / handheld PCs. They use flash memory, albeit out of necessity (price, power consumption, size, etc)... but we're already beginning to see laptops incorporate solid state storage technologies. It's only a matter of time.

      Now, if we could just get around that pesky limited-write lifetime ... ;)

    2. Re:Hard disk is an obsolete technology by bigberk · · Score: 1

      The IDE hard drives aren't nearly as reliable as they (could) be, and the manufacturers know it. I recently had an IBM tech support tell me about a near top of the line drive that I really shouldn't use it 24 hours a day, in a server setting.

      Storage media without moving parts is the future. The trick is to make it nonvolatile, but cheap.

    3. Re:Hard disk is an obsolete technology by Dyolf+Knip · · Score: 4, Interesting
      So hard drives are about 10 years ahead of RAM in terms of $/MB? Sounds about right. 1GB hard drives were on the high end of normal users at the time, as is 1GB of RAM today (though I seem to recall having more than 10MB RAM at the time). Assuming the same increases in the next decade... 100GB RAM and 10TB drives. I like.

      Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.

      As someone else said, it is only a matter of time.

      --
      Dyolf Knip
  20. another reason by oyenstikker · · Score: 1, Interesting

    less mirrors = less computers = less space
    real estate is expensive.

    --
    The masses are the crack whores of religion.
    1. Re:another reason by JM_the_Great · · Score: 1

      I'm willing to bet that compared to the cost of the computers/internet connection (think of all the bandwidth they need)/staff/whatever, that land is prolly one of their lesser expenses.

      --

      --Justin Mitchell
      "2nd Place is a fancy word for losing" --Bender (Futurama)
  21. DRAM by Anonymous Coward · · Score: 0, Redundant

    The major advantage DRAM has over hard drives in Google is that when the machine reboots the memory will be cleared and then it will go scan pages again. No need to save what was in memory the previous time. Good idea for google bad idea for accounting software.

  22. The latest 2600 mag... by AltGrendel · · Score: 2

    ...has an article on this very subject. The listed article "How to hack from a RAM disk" is what you're looking for.

    --
    The simple truth is that interstellar distances will not fit into the human imagination

    - Douglas Adams

  23. Something Nobody's Mentioned by Guppy06 · · Score: 4, Interesting

    DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.

  24. Bottlenecks... by percey · · Score: 3, Insightful

    More often than not with a database your bottleneck is I/O. When you run a database you cannot have enough disks, and you cannot have enough FAST disks. In order to accomplish the kind of I/O bandwidth that a place like google is going to need you're going to need the best EMC arrays (or perhaps an IBM Shark) money can buy. And guess what? They run you megabucks. You can't just take a bunch of SCSI disks and expect them to perform as well as Fibre channel arrays. You gotta have controllers with multiple caches. Everyone who's never dealt with databases think that SCSI is the beginning and the end of hard drives, and its so far from being the truth its not funny.

    I've really no idea how complex the queries are or whether or not they use a relational database but that being said its still has to hit the disk to retrieve the data and that's where every decently designed database's bottleneck is. Besides google caches all its pages. Egads! Do you have any idea how much RAM they must need for just that alone? Yes RAM is faster. Oracle even teaches you to try to keep your frequently used tables in cache anyhow, because its fastest, of course they qualify that with the word small realizing that most people don't have the gobs of memory needed to cache large tables.

    1. Re:Bottlenecks... by Wesley+Felter · · Score: 2

      Actually, I've read that Google uses legions of machines with a few IDE drives each. The Wayback Machine uses similar hardware. Keep in mind that these are custom applications, not off-the-shelf databases, so they are written with shared-nothing clusters in mind.

  25. More importantly than the DRAM... by LatJoor · · Score: 2, Insightful

    Although it's not mentioned in the Slashdot writeup, I think that probably the most important part of this interview was the discussion of Google's business model and future. It's good to see that they're committed to not getting in over their heads with extraneous services. They've found a business model that works and they're sticking to it, rather than getting greedy and adding dumb new services that have nothing to do with searching, or "search," as he put it.

    A lot of technology companies would do very well to follow Google's example, it seems to me. They're proving that Internet services are a perfectly sound venture if the company has a sensible business model and always keeps focused on providing quality technology and services in the area that they know best.

  26. Pretty amazing, but I can see it. by dinotrac · · Score: 5, Insightful

    Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

    1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
    2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
    3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
    4. Which leads, of course, to lower A/C bills during the warm weather.
    5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
    6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

    OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?

    1. Re:Pretty amazing, but I can see it. by russh347 · · Score: 2, Funny
      how many sysadmins are more than a step from insanity anyway?

      Absolutely none.

    2. Re:Pretty amazing, but I can see it. by ParisTG · · Score: 1
      OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?
      I know a few who are a few steps past it already :).
    3. Re:Pretty amazing, but I can see it. by rpack · · Score: 1

      I've had a chance to see Google's secondary facility from the other side of the fence. It's an amazing setup. Rack after Rack of 1 RU units with some 2 RU's thrown in.

    4. Re:Pretty amazing, but I can see it. by ivrcti · · Score: 1

      Only those who have already crossed that thin, sweet line......

  27. Overview of Today's Headlines by Corrado · · Score: 4, Insightful


    Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.


    This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!

    Check it out here.

    --
    KangarooBox - We make IT simple!
    1. Re:Overview of Today's Headlines by costas · · Score: 3, Interesting

      Hmmm... I can top that.

    2. Re:Overview of Today's Headlines by mikeage · · Score: 2

      Columbia has something similar.. my future brother-in-law was a grad student writing some code for it. It's from their Natural Language Project.

      http://www.cs.columbia.edu/nlp/newsblaster

      --
      -- Is "Sig" copyrighted by www.sig.com?
  28. Why DRAM is cheaper for Google by Veteran · · Score: 1
    Put simply: "Time is money" the time waiting for a hard drive seek costs money. Under some circumstances that time cost is bigger than the initial cost of the more expensive components.

    Additionally DRAM has a much longer time between failures than hard drives do; so maintenance costs are lower.

  29. Yes, RAM is cheaper than HDD by Ryan+Barrett · · Score: 1

    The throughput from RAM-RAM is on a totally different order of magnitude than HDD. The read-time alone makes RAM more "ecomonical" than HDD (at current memory costs). If google were to switch to HDD, then they would need one copy of their entire DB for each search - which would mean thousands of copies of their DB. With RAM they only need a few copies - making the total cost lower with RAM.

  30. You guys are missing the point... by duffbeer703 · · Score: 4, Insightful

    DRAM requires little electricity and produces almost no heat.

    Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.

    Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
    1. Re:You guys are missing the point... by Anonymous Coward · · Score: 0

      For the same capacity, hard drive uses LESS electricity than DRAM.

      Go to Samsung website, pull out a datasheet on SDRAM and do the math. Tell me you can make 160G of DRAM array that uses less power than a hard drive.

    2. Re:You guys are missing the point... by SilentChris · · Score: 2

      What about costs to maintain redundancy (if a server goes down?)

    3. Re:You guys are missing the point... by kesuki · · Score: 3, Informative

      With over 35 DRAM chips on the american market what good does it do to check only a single type of memory module from a single maker?
      However, since I don't want to spend the rest of the day finding out the lowest power DRAM module with the highest capacity, I will assume that the best case Senario is 4GB of ram using approximately the power of two HDs of any capacity after 4GB you would require either a custom DRAM NAS/HD or a second PC. However NAS Dram with multiple gigabit ethernet ports offer the most DRAM storage per watt of electricity. Still it is at least 4x as power hungry as an 8 HD 1TB Raid server. Assuming each DRAM chip in the NAS is 64 Megabytes. To reach one terrabyte we need 16 thousand Dram chips. Obviously if each chip even requires .1 watts to operate they're using 1600 watts of power. While the HD server may need a peak of 500+ watts even under load it still isn't using as much as when all 8 drives spin up so it's probably only using 400 watts total for the whole system under load.

      While it's pretty clear that power isn't an area that google can save money using DRAM over HD, and while DRAM is solid state and if it doesn't fail the first 6 months it probably wont fail in the first 100 years, it is still going to become obsolete long before it fails, requiring replacement. I've also figured that at $4 a Dram chip the cost of 1TB is $64,000 Vs $5,000 for a total package 1TB HD server. Even if you replaced the drives every 6 months it would take 15 years before the cost of materials on HDs exceeded the cost of materials on DRAM. However, there is a cost savings. First of all if you're mirroring the drives that doubles the electrical and material cost of the HD storage. Second of all that 1 GB HD server is only going to have it's seek time saturated by only 100 megabit ethernet.
      Unless the data is entirely sequential (not requiring seek time) and even in the case of sequential data a single gigabit ethernet is sufficient. That Dram 1TB has at worst 12 NS latency or .000000012 seconds per seek. That provides 83,333,333 seeks per second. The only thing he was wrong about is that DRAM isn't 200,000 times as faster as HD for data that requires seek it's on a magnatute of Millions of times more effcient. 200,000 times is probably based on real world performance differences. based on using DRAM vs HD in a "real world" setting and not just on paper. That means to replicate the Speed of DRAM with hard drives is a futile task.
      Far more futile than trying to replicate the capacity of HDs with DRAM.

    4. Re:You guys are missing the point... by Anonymous Coward · · Score: 0

      The end result that you got is not too bad, but the methods are.

      Something to consider about your estimates is that m/b chipsets (from my understanding) are usually set to use 4 full clocks for reading above and beyond the seek even if you are only interested in one byte of data. This applies for both SDR SDRAM and DDR SDRAM implimentations. RDRAM has much slower access times than SDRAM due to its protocol based nature.

      Modern hard drives have the problem where you don't only have to wait for the heads to seek to a certain cylinder, you also have to wait for the disk to rotate so that the particular sectors to pass under the heads before seeking to the next location.

      Depending on what kind of search you do and how big the tables are you can easily be reading in hundreds of bytes if not a few kilobytes per seek when searching a database. Couple this with the potiential to group some reads into one seek when using hard disks and you can easily chop a couple of orders of magnitude off of your end result.

      Using caching schemes where appropriate can hide much if not most of the latency issues associated with hd's, giving you the best of both worlds while still being cost effective. Then again if you need it to be fast all of the time or caching doesn't help speed things up in your situation then you are back to focusing on disk speed. (The design of some databases like Terradata are not helped much by caching like what EMC offers.)

    5. Re:You guys are missing the point... by duffbeer703 · · Score: 2

      Irrelevant in the google model.

      Google isn't using SAN arrays -- they are using thousands of disributed systems with one or two drives. In this model, they are saving memory by using DRAM, if not by direct energy savings then by savings in cooling equipment.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
    6. Re:You guys are missing the point... by duffbeer703 · · Score: 2

      Traditional redundancy schemes(raid, etc) just aren't a factor in the Google system.

      Google's applications replicate data across hundreds or thousands of servers in real-time. Most of their thousands of systems can be pulled off-line with no signifigant data loss or impact on the overall system.

      Read some of the past Slashdot stories on google that describe how it works. I believe there was a story in June or July that showed how they achieve great performance & rendundancy on the cheap.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
  31. I'm curious by Anonymous Coward · · Score: 0

    What was the search string, so the rest of us can slay (slashdot) the mighty google.

    1. Re:I'm curious by Anonymous Coward · · Score: 0

      Sorry, the search was cached after running once. It was "from what it was to a". It only took 4.37 seconds now. The second time I tried it just now took 1.21 seconds.
      Originally, though, I assure you it took very long (>30 seconds). Just once though, afterward it always was produced quickly. I think google automatically caches any search that takes more than a certain amount of processing power.
      Oh, and I lied: it was just from the complete works of w.s., not from several etext sources.
      this is how you phrase the search: "+from +what +it +was +to +a".

      If you'd like to repeat the procedure, follow these steps:
      find a site listing the most common words in the english language. (you only care about the top ten or twenty).
      try them each until you get to one that google accepts even without a + before it.
      This is your list of stop-words.
      Write a script to the following algorithm:
      - Open input fille for reading.
      - Init LongestPhrase to "".
      - Init LongestLength = 0. (number of words).
      - Init CurrentWord to "".
      - Init CurrentPhrase = "".
      - Init CurrentLength = 0;
      - For each letter in the file:
      o if it's between a-z or A-Z, add it to CurrentWord
      o if it's not, then you've just reached the end of CurrentWord. Therefore:
      --> If CurrentWord is one of your stop words, add it (after a space character) to CurrentPhrase, and increment CurrentLength. Reset currentword to "".
      -->If it isn't one of your stop words, then you've just found the end of your phrase. See whether it's longer than your longest phrase, and if so, replace your longestphrase with it. (and yoru longestlength with currentlength) Then reset your current word, phrase, and phraselength.
      - Print LongestPhrase, "(", LongestLength, " words).
      Obviously, things like "it's" can't be part of a phrase, even though they're common, but this'll get you a good long phrase of stop words.

      Please thank me.

  32. Tandy 1000 by Fortyseven · · Score: 1

    Anyone remember the Tandy 1000 that had MS-DOS and Deskmate in ROM? :)

    1. Re:Tandy 1000 by Anonymous Coward · · Score: 0

      Deskmate. Damn, that takes me back.

      Thanks!

  33. The key is in the MTBF by eldurbarn · · Score: 5, Informative
    My last job was at one of the "other" search engines. We had a disk farm somewhat smaller than Google (about 140 Tb), mostly configured in RAID arrays, and we were swapping out dead bricks every few days.

    Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

    --
    -Eldurbarn
    1. Re:The key is in the MTBF by Electrum · · Score: 1

      Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

      Aren't the dead drives covered by the manfacturer's warranty?

    2. Re:The key is in the MTBF by Anonymous Coward · · Score: 0

      you still have to do the running around to replace them

    3. Re:The key is in the MTBF by chiph · · Score: 0

      It's not the cost of the drive, or the length of the warranty, it's the cost of paying someone to replace them as they fail.

      Another way to look at it:

      A Western-Digital Caviar WD1200BB has a MTBF of 500,000 hours. With 10,000 servers, that means *on average* they'll lose a drive (and the associated server) every 50 hours, or roughly every 2 days. Sure they could use Ghost or Drive Image to reload a new drive within 2-3 hours, but if the company could avoid all that work by using DRAM and network booting from a RAID 1+0 array, they'd do that.

      If they've got growth like they say, then there's probably a team in each of their datacenters configuring machines as fast as they can. They must have a rack of cordless screwdrivers on constant charge.

      Chip H.

    4. Re:The key is in the MTBF by RandyOo · · Score: 1

      You said:

      They must have a rack of cordless screwdrivers on constant charge

      What the heck is a cordless screwdriver? I've never seen a screwdriver with a cord...

    5. Re:The key is in the MTBF by whaley · · Score: 1

      Electrical screwdrivers used in factories to assemble PCs are not cordless (at least not when I looked). I can imagine running around a server room with an electrical screwdriver attached to 220V (or 110V ;-) is not much fun.

    6. Re:The key is in the MTBF by Alsee · · Score: 2

      What the heck is a cordless screwdriver?

      Think cordless power drill. A motor can spin a screw in or out in under a second. Not only does it save time, but it prevents fatigue. You don't want to turn dozens or hundreds of screws by hand on a regular basis.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  34. not just web pages by gimpboy · · Score: 1
    dont forget the following

    images.google

    catalogs.google

    groups.google

    the catalog part is still in beta, but it's really amazing. when you do a search it actually hilights the words within what appears to be images. really cool. i could see how the three above could easily up their capacity to 50tb.

    --
    -- john
    1. Re:not just web pages by Metrol · · Score: 2

      the catalog part is still in beta, but it's really amazing.

      I hadn't really looked at that part of Google until your post. Based on a couple of searches I did, didn't seem all that amazing to me. More like white knuckle frightening!!

      This must be that level of technology that is too easily taken for magic. There are just too many perfectly rational reasons why this "shouldn't" work at all!

      --
      The line must be drawn here. This far. No further.
    2. Re:not just web pages by Anonymous Coward · · Score: 0

      oh yes, because OCR is OH so magical. hey everyone, watch this guy pull a rabbit out of his @$$. look... nothing up his... oh wait... nmd.

  35. backups. by gimpboy · · Score: 1

    i would imagine they have backups of some sort. even if its just dram rsyncing across the internet.

    --
    -- john
  36. Google is great... by Calle+Ballz · · Score: 2

    ...but they'll get a million times better as soon as they'll allow boolean searches. Man sometimes it's frustrating!!

    1. Re:Google is great... by russianspy · · Score: 2, Insightful

      They do. Read the guide. You can include parethesis, AND, and OR. I don't remember if they allow XOR and others. Oh... They allow negation as well.

    2. Re:Google is great... by SpinyNorman · · Score: 3, Informative

      Um.. they do.

      AND is by default
      OR is OR
      NOT is -

      I don't think parenthesis for grouping works though (they don't mention it), so you can't do more complex queries, but you can certainly do:

      A AND (B OR C) AND !D

      Which would be: A B OR C -D

    3. Re:Google is great... by J'raxis · · Score: 1

      And exact phrases by "quoting it." However, it seems once the search runs out of exact phrases it falls back to a simple AND match just so it can provide you with something.

  37. DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · · Score: 3, Informative



    Its not a fair comparrison to put 1GB worth of DRAM on one side of the scale, and 1GB worth of physical storage on the other. The hard disk will obviously come out to be the cheaper of the two. However, to a company like Google who undoubtedly uses RAID technology for storage, you're effectively not getting the same "bang for your buck" as you would with a JBOD array. In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.

    Mind you, thats not to say that RAID is a bad technology..heh, hardly. Its just that you cant make a 1 to 1 comparrison from DRAM to physical without taking into account the storage methods employed by each.

    Cheers

    --
    Bowie J. Poag

    1. Re:DRAM probably is cheaper...Here's why. by foobar104 · · Score: 2

      In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.

      That isn't true at all. If you wanted to, you could mirror all of your data on two separate JBODs-- RAID level 1-- but that's not efficient. If you use RAID 3 or RAID 5, you'll never use more than 33% of your storage for parity data. As the size of your RAID set increases, the percent allocated for parity data goes down. In a 10-disk set, one disk is used for parity (in the case of RAID-3), which is only 10% of your total storage. (In the case of RAID-5, you'd still use only 10%, but you'd use 10% of each disk instead of one whole disk.)

    2. Re:DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · · Score: 2



      The example I gave was meant to demonstrate a point, not to be pedantic and overly technical. I'm well aware of the different pros and cons of RAID types. :) I do it for a living.

      FYI, there is no such thing as a "parity disk" when it comes to RAID. I think you might be confusing parity with the notion of a quorum disk, which is something very different/ Parity is distributed thruought the array, and changes dynamically as data gets poured into the set. Having a "parity disk" would be contradictory to the whole point of RAID, as it represents a single point of failure for your storage. Not good.

      Also, Google is a HA cluster. I can guarantee you they arent using JBODs to house their data, as you've inferred.

      Cheers,

      --
      Bowie J. Poag

    3. Re:DRAM probably is cheaper...Here's why. by Junta · · Score: 2

      Actually, in RAID-4 (maybe 3, don't remember) there is a parity disk. The reason why parity info is distributed in RADI-5 is for performance consideration. Having a parity drive is in no more a single point of failre than distributed parity. So what if you lose the parity disk? Read and write operations wil continue to work (in fact, degraded mode in this circumstance would actually improve write performance, as you now have a RAID-0). Stick in your spare (or switch to a hot-spare), and the arrary reconstructs the parity disk just like any other.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    4. Re:DRAM probably is cheaper...Here's why. by foobar104 · · Score: 2

      I'm well aware of the different pros and cons of RAID types. :) I do it for a living.

      Are you sure about that?

      FYI, there is no such thing as a "parity disk" when it comes to RAID.

      In a RAID-3 implementation, parity data is generated for each stripe unit and stored on one disk of the array. In RAID-5, the parity data is stored across all disks of the array, a little bit in every stripe unit. (RAID-4 implements parity on the block level instead of the stripe level; it doesn't really have any advantages, so it's almost never used.)

      "Quorum disks" are, as you said, something entirely else. They're related to a particular type of implementation of failover clustering, widely considered to be inferior to true highly available systems.

      Perhaps you're confusing RAID with high availability. That would explain your response, I think.

      In short, you're either wrong, or your post was so unclear that you might as well be wrong.

      Having a "parity disk" would be contradictory to the whole point of RAID, as it represents a single point of failure for your storage. Not good.

      False. Consider a three-way parity set: disks one and two contain data, and three contains parity. If you lose disk 1, you can reconstruct it from disk 2 XOR'd (or whatever; the method depends on the parity generation scheme and is irrelevant) with the parity disk, and vice-versa. And if you lose the parity disk, you reconstruct it from disks 1 and 2 XOR'd (or whatever) together. There is no single point of failure there.

      In fact, set rebuilds are significantly simpler in a RAID-3 implementation than they are in RAID-5.

      I ask again: are you absolutely sure that you do this for a living?

    5. Re:DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · · Score: 2



      Eeek, I said theres "No such thing as a parity disk in RAID" up there? Egads.. :)

      For the record, yes, I meant HA. Not RAID in that context. I was attempting to point out that Google's choice of storage strategy would depend largely on the need to eliminate singular points of failure.

      To continue the discussion, RAID 3 would be a rather poor choice of RAID type for an HA cluster. In RAID 3, parity needs to be handled sequentially whereas in RAID 5, read/write operations can happen simultaneously since parity isnt localized to any one particular drive. The margainal speed advantage RAID 3 offers over 5 is seldom enough for a typical admin to justify in the long run. Its only really seen in situations where overall latency takes a backseat to speedy access to huge files. Thats been my experience, at least.

      And yes, i'm absolutely sure I do this for a living. :) I'm also absolutely sure I've had pneumonia & bronchitis for the past week, high fevers and all. Ended up in Urgent Care with a 104.6'F the night before. Hope that explains my storage faux-pas. :)

      --
      Bowie J. Poag

    6. Re:DRAM probably is cheaper...Here's why. by foobar104 · · Score: 2

      Okay, I knew there had to be some explanation. ;-)

      We use RAID-3 exclusively, because our stuff requires deterministic read speeds. It's also a lot simpler to design software RAID-3 implementations because the parity generation and the rebuild algorithms are so much simpler.

      We're going to start using RAID-5 for some of our new applications, though, because we just signed up to bundle HDS 9960 storage systems with our application. So that's going to be kinda exciting.

      In my experience, obviously different from yours, RAID-3 and RAID-5 come up about 50/50. It just depends on what you do with it.

  38. Just Think by Waffle+Iron · · Score: 1, Redundant

    Every piece of drivel that you spew forth and put on the web is going to be permanently enshrined in its own little piece of DRAM at Google (Probably including this stupid comment.). Each bit of each every word ever put on the web is destined to be endlessly and pointlessly refreshed every few milliseconds, expending its own miniscule amount of energy and waiting in vain for that one stray alpha particle to cause a soft error and finally put it out of its misery. It seems like something Andy Warhol would have predicted.

  39. One word - Latency by prestwich · · Score: 1

    Hard drives latency is too high. If they used hard drives the machines would be sitting their most of the time waiting for the drive to find things.

  40. Re: Power Chord- by kuhneng · · Score: 2, Funny

    The sound a Mac makes when you turn it on.

  41. The Google feature I want by Hanzie · · Score: 4, Funny

    See that "mature content filter"?

    How about a "mature content ONLY search"?

    --
    ********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
    1. Re:The Google feature I want by hayne · · Score: 0, Offtopic
      How about a "mature content ONLY search"?

      % google -noFilter | google -v

    2. Re:The Google feature I want by apirkle · · Score: 1

      This would be very easy to do. Write a script to grab the page with the filter on and off for the same search query (you probably want to set it to return 100 results), and just diff the two pages.

      *scurries off to write pr0n-search.pl*

  42. That's why I use a Mac... by Anonymous Coward · · Score: 0

    Mac's have a studied and proven lower TCO then Windows PC's. But the ability to buy some POS sub $1000 box rules over everything.

    1. Re:That's why I use a Mac... by Anonymous Coward · · Score: 0

      Good fucking God. Will you Macolytes PLEASE stop quoting those idiotic studies that were done when Windows 3.1 roamed the earth?

      Macs have a TCO of infinity because you they can't run the software that you want to run.

  43. Innumeracy and price comparisons by Alomex · · Score: 2

    One would have expected /. nerds could to better at price comparisons than what we have seen so far.

    Quick, what is a better price a 1994 Ford Fiesta at $10,000 or a brand new Ferrari at $12,000?

    Clearly the Ferrari is a better deal. To do a proper price comparison you have to look beyond the sticker price alone.

    What is the performance you get? resale value? maintenance cost? operation costs?

    If all you wanted to buy is megabytes of storage you would be better of buying backup tapes. They are hard to beat price wise.

    But in all likelihood you need to store that data for some purpose, so depending on frequency of access, latency, total cost of operation (tapes are operator/robot mounted), alternative solutions with higher sticker price, might well end up being cheaper.

    What Eric Schmidt claims is that if you have a ton of data and you are accessing it all the time DRAM is more cost effective than (a) a large mirrored RAID array server or (b) a zillion tapes being mounted by operators.

    1. Re:Innumeracy and price comparisons by eli867 · · Score: 1

      You've obviously never seen how expensive it is to get a tune up on a Ferrari.

      eli

  44. Newsnow did it first... by Sits · · Score: 1

    ...and they also do it faster but Newsnow isn't nearly as big or as popular as Google. They also seem to aggregate more sources than google (slashdot is aggregated for example).

  45. TOC, RAM vs. Steel Platter by eyepeepackets · · Score: 4, Informative

    Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.

    These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)

    Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*

    If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.

    Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)

    Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.

    As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.

    --
    Everything in the Universe sucks: It's the law!
    1. Re:TOC, RAM vs. Steel Platter by psych031337 · · Score: 2

      This thing would really rock if you could use it to boot up your machine. Imagine an instant OS. Rebooting in less than ten seconds.

      i'm off to change my pants.

      --
      +++ath0
  46. wrong... 10watts for 1GB reg. ECC SDRAM (PC133) by Lazy+Jones · · Score: 2

    ...

    --
    "I love my job, but I hate talking to people like you" (Freddie Mercury)
  47. Not really cheaper in the long run... by cdrj · · Score: 1

    I would bargin to say that by the time Google recovers from the overall cost of buying this than hard drives will have advanced to the point where they are on par with today's current ram...

    1. Re:Not really cheaper in the long run... by Anonymous Coward · · Score: 0

      This is one of the more idiotic comments ever made on slashdot.

      HD access times are measured in milliseconds. RAM access times are generally measured in tens of nanonseconds. That difference is five orders of magnitude. HDs aren't catching up with DRAM any time soon.

  48. Index space? by SuperKendall · · Score: 2

    That's a great calculation, but just figures the space needed for caching the raw data.

    What about the indexes required to actually access that data in a timley manner? Once you factor in the extra stuff needed to actually make it a viable search engine, you could easily imagine a PB or more of storage was required.

    As for the other poster going on about comrpessing the data - I doubt they'd want to compress the data when all they are concerned about is raw speed of processing requests!

    .

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
    1. Re:Index space? by spiro_killglance · · Score: 3, Insightful


      I don't know how google to it. But typical the
      main over head is the inverse file, for every word on every page, you just need the number of the page it was in and the word position on that byte. So the Google needs around 8-12 bytes per (non stoplisted) word.

  49. quick math by Anonymous Coward · · Score: 1, Insightful


    Lets assume that Google needs 100 TB of data. Possibly not correct, but probably not off by more than an order of magnitude either high or low.

    Lets just take a look at sharky's ram price guide, and we see that a 512 meg module costs about $75, or $125 if it's ECC. So one gig of ram costs between $150 and $250.

    Assuming they used some sort of non-standard computer system that supports vast quantities of ram (so the system price is almost entirely dependent on RAM prices) then we find that one TB costs about $200,000 or $300,000. This assumes that a box which can hold 1 TB of ram (2,000 of the 512 mb modules) costs about $50,000. Perhaps not beyond reason. Maybe it costs more, but once again it should be within an order of magnitude (no more than a million $ or so).

    If they have 100 TB of stuff they need to store then that comes to a grand total of $30,000,000 to store it in ECC dram. Not unreasonable.

    Of course, if the database size is only about 10 TB, then the total cost is more like $3,000,000 which is pennies for Google (probably). Basically, RAM is not so expensive that huge quantites of data cannot be stored in it, if one is determined.
    In addition, the power dissipation would be very low, fewer power supplies, fewer servers of every sort, etc.... Do you think you could build a massive fiber channel RAID array that would serve Google's needs for $3-30 million?

    My $.02

    Tyler Ward
    tjw19@columbia.edu

  50. They must mean FIXED HEAD 'disks' v DRAM by Mongoose · · Score: 2

    Fixed head hard drives have no seek time, since tracks have a many to many relationship to heads. That's also why you can't get them at compusa. ( expensive )

  51. quote by JimBobJoe · · Score: 1

    I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

    Does anyone remember the SNL skit concerning a bank which specialized in making change?

    "From a dollar, you can get 20 nickles. You could get 10 nickles and 50 pennies--if you want. How do we do this...?volume!"

    That's what I thought of.

  52. trickle down by Anonymous Coward · · Score: 0

    this means that we all will have google (and suchlike) to thank in a few years when we're all using computers with no moving parts. :)

  53. Re:fist post by Anonymous Coward · · Score: 0

    I'd love to stick my fist in your vagina. Where do you live? I'll be right over.

  54. Why DRAM is cheaper by Animats · · Score: 2
    The price advantage of storing the data in DRAM comes from needing fewer copies. A disk-based search engine like Inktomi has many duplicated clusters, each with a copy of all the data, to get the traffic capacity needed.

    Also, Google's searchable data is considerably smaller than the total size of the pages searched, even excluding the images. Read their white papers. And I doubt that they store the cached pages and images in DRAM. Those don't get hit that often.

  55. Lay off those power chords! by Mr+Z · · Score: 0, Offtopic

    Cranking the amp too loud is bad for the computers. Or did you mean power cord ?

    --Joe
  56. I've read the comments, but no one answered by Catbeller · · Score: 2

    a least completely.

    I had the same question myself over the years. Especially recently, as memory prices dropped through the floor.

    Linux has the option of loading itself into a ramdrive, and that's great. But why not Windows 98 or ME? Is it because it was technically hard, or was it instead tht the concept was too alien to the developers? (One ALWAYS uses disk! Don't bother me!)

    RAM is faster -- always. I realize you that you can't live off of RAM alone, but at the very least the swap file shouldn't be on disk. I've spent too much time in the past ten years listening to hard drives slice meat as I waited for Windows to move pages off of and into RAM.

    Well, if XP provides the option, fine. But I won't use XP. Don't like subscription OSes. Maybe the 2K version permits it. I'll try.

    Wonder how much of computing is just bad habits?

    1. Re:I've read the comments, but no one answered by Oroborus · · Score: 1

      I'll try to give a clearer answer then.

      Windows doesn't load itself into a ramdisk because it's not neccesarily the most efficient. When an OS is running, it's not constantly using the entirety of it's code. Windows, for example, will be heavily utilising the DLL's for the API, but likely won't use the cryptographic DLL's more than once a week or so (if that). So loading the whole OS into RAM is effectively making large sections of it useless.

      So the best way to use RAM is to let the memory manager do what it's designed to do. The memory manager will load the parts of the OS that are most frequently used into RAM, and keep them there as long as they remain most often accessed. Similarily for most frequently accessed programs.

      The exception to this rule is when you want either latency (ie. you don't use a program much, but when you do you want instant response) or when you have a large program that is very tightly integrated or a large section of data (like a database, where you want low latency for all pieces, even if the block they're in is accessed once a week; or doing image editing or CAD work, when pieces of your work can start getting paged out because of inneficient memory management.)

      Hope that makes it clearer.

    2. Re:I've read the comments, but no one answered by Querty · · Score: 1

      but at the very least the swap file shouldn't be on disk

      Think about it:

      1. The OS runs out of RAM
      2. So it starts to use the swap file/partition to swap out full, but unused areas of RAM.
      3. This frees up some RAM
      4. The OS is happy

      Now what happens when the swapfile is on a RAM disk? Before you went on a rant, did you even consider that it might help if you knew what "swap" is?

      B.T.W. About XP's feature of loading the kernel into RAM, what do you think the following Linux kernel bootup message means?

      Memory: 384320k/393136k available (1105k kernel code, 8428k reserved, 306k data, 232k init, 0k highmem)

      Jeez, I thought /. was for nerds. My mum knows this stuff ;-)

    3. Re:I've read the comments, but no one answered by Anonymous Coward · · Score: 0

      You fucking idiot, if you don't want swap on disk then disable swap. Do you even know what a swap file is?

    4. Re:I've read the comments, but no one answered by um...+Lucas · · Score: 2

      but at the very least the swap file shouldn't be on disk. I've spent too much time in the past ten years listening to hard drives slice meat as I waited for Windows to move pages off of and into RAM

      Then add RAM. The entire reason for the swap file is because you don't have enough RAM. Thus, the OS is set up to use the hard drive as a slower back up... It'd be a waste to store your swap file on a RAM disk. Just add 1/2 a gigabyte or a full one and turn the thing off.

  57. Take a BUSINESS perspective (yes, it's painful...) by Colz+Grigor · · Score: 1

    Many of you are comparing DRAM to HDs solely on an overall price scheme (DRAM are $150/gB and HDs are $3/gB). Some of you have taken this a step further and compared things based on cost (DRAM may be 50x more expensive, but they require 1/10th the power, so over a period of time, the DRAM will wind up costing less). Ultimately, anyone with a good sense of business will look at the perceived value or return on investment (ROI) of the proposed solution over a period of time considering the time-value of money. This is called a net present value (NPV).

    In order to acheive the lowest possible NPV, a high-tech financier will break their disparate technologies into a common measure and place a value on the item utilizing that measure. In other words, they'll compare DRAM and hard drives on a price per performance basis.

    At Google, I'm positive they're far more interested in latency (gotta get the fastest hit times, right?), so the calculation they use to compare disparate technologies will likely be price per gigabyte cross latency. Since the latency on DRAM is much less than the latency on HDs, the 150x price suddenly flips, and we find that DRAM is 6x more valuable than HDs.

    But they've still got to put that into a spreadsheet and add up all the associated costs for each solution (including maintenance, power, expected failure rates and costs associated with the failure (including costs associated to the loss of information stored on volatile memory), etc.) over a period of time.

    It's an extremely complex calculation, from a business perspective, so I doubt that Mr. Schmidt has his head up his ass.

    For Google's purposes and given Google's attitudes toward generating a ROI, DRAM costs less than HDs. This does not mean that the same would be true for Akamai or NetworkAppliance or you.

    ::Colz Grigor

    --

  58. RAM Nodes by GrEp · · Score: 2

    In many clusters today like KLAT2 they only use hard drives for the root nodes, and the other 98% of nodes use 2GIG of ram.

    This saves you at least $150 per slave node by not buying a hard drive, thousands for having to deal with less hard drive failures, and acess times are orders of magnitutes better.

    Lets do the math. 512MB of PC133 on pricewatch today was $67. For 2GIG of ram that comes out to $268 per node. For a terabyte(2modules*$67*1000GB)=$134,000.

    That blows my mind. A small research lab can now own a terabyte of PC133 for under $150,000. Man, do I feel old.

    --

    bash-2.04$
    bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
  59. I wish I was a Sales droid! by ToasterTester · · Score: 1

    I'd sure like to be the saleman selling Google their hardware. 10,000 RAM heavy servers, KA-CHING, KA-CHING, KA-CHING, KA-CHING, KA-CHING! My eyes are filled with dollar signs of massive commissions.

    What the article doesn't point out is how are they doing the RAM thing. Are they buying Solid State drives (physically look like hard drives, but are nothing buy RAM) or are they just cramming RAM in the servers so the database and its data is all in RAM? That's common to do with databases for performance.

  60. Latency and bandwidth by Zeinfeld · · Score: 2
    The key to the cost comparison is that RAM supports more queries per second than disk. Supporting the number of queries per second using disk would require a lot more duplicates of the data to support the query rate.

    The cost differential between RAM and disk has been eroding for some time, particularly if you compare RAM with SCSI disk. While the price of IDE had dropped, SCSI is still premium priced for the business market, even though there is no reason why a SCSI controller should cost a cent more than IDE.

    A 80Gb SCSI-160 drive costs $800, RAM costs $150 for a 512Mb DIMM. So Disk costs $10 per Gb compared to $300.

    The problem with the raw comparison is that you still need a lot of RAM to service a large disk, caching etc. There is also a limit on the amount of disk data one CPU can effectively manage. From experience I can asure folk that that limit is certainly less than 80Gb if the lokups are frequent!

    So when you add the cost of a CPU and box into the equation the RAM solution is gong to look much better. I doubt that a single CPU could effectively manage more than 4Gb of disk data, but 4Gb of RAM data is quite viable. And you probably need at least 1Gb of RAM to support the disk data in any case so the all RAM solution looks good.

    For most database applications RAM wins hands down. On top of the cost of the disk you have to count on

    • The cost of an Oracle license ($100K +++)
    • The cost of a whiny Oracle DBA ($100K/pa)
    • The cost of an equally whiny SQL programer to interface your code to the crack pot SQL data model ($100K/pa)
    • The cost of licenses for GUI based schema design tools etc. etc. for the whiny SQL types
    • Trips to CostCo for Malox

    The main problem for the RAM route is getting persistence on transactions. So you need some secondary storage in case of power failure or disaster. This could be tape, but ironically disk is cheaper to run these days than almost all tape systems. A 40Gb cartridge for a tape drive can easily cost $150, which is more than an IDE disk drive that outperforms on practically every level (probably even longevity).

    The key is that you use your secondary storage to write out the transaction log, you don't attempt to maintain the data structure on disk like SQL databases do. For high reliability you use a complete duplicate of the system to provide your first level backup with disaster recovery at a remote site.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  61. Power Consumption by Anonymous Coward · · Score: 0

    The difference in power consumption is huge between HDD's and RAM. When you consider that modern memory uses something along the lines of up to 600mW at 1V-5V per DIMM at most, it is a lot cheaper than buying a data warehouse of HDD's.

  62. Re:Take a BUSINESS perspective (yes, it's painful. by Colz+Grigor · · Score: 3, Insightful
    One other follow-up:

    Google will also likely break their technology into three components:

    spidering and indexing

    searching

    caching

    Each of the financial analysts for the business groups responsible for each asepct of Google's technology may calculate the value of DRAM vs. HD differently. For searching, latency is extremely critical, but it's not so critical for caching, and there may be some physical problems with solely using DRAM for indexing.

    That being said, I would expect Google to use HDs for spidering and indexing, DRAM for searching, and HDs for caching. Mr. Schmidt was probably only discussing technology on the most visable component of Google's technologies: searching.

    ::Colz Grigor

  63. Price performance by Anonymous Coward · · Score: 0
    The crucial question is overall price performance. Google has an inverted index for all the web data, and they have really high throughput requirements. This means that bandwidth is crucial.


    The average disk can sustain between 100-200 IOPs, while the average memory module can sustain about 10,000,000 IOPs (100ns latency). At $120/disk, this works out to $1.66/IOPs, and at $250/GB for memory, this works out to $0.00025/IOPs.


    Google currently claims to index about 2G pages. If one assumes on average each page is 4KB, and that the inverted index takes half the space of the original text, then this means 4TB of index. 4TB of RAM at $250/GB is 4K memory modules for $1M. Assuming their motherboards can hold 2GB each, this means 2K machines at perhaps $120 each for another $250K. Now, those 4K memory modules on 2K motherboards can sustain something like 40G IOPs. $1M of disk is roughly 8K disks for 1.6M IOPs. In a real system, load is never evenly distributed so you are almost never able to approach the theoretical limit.


    For more details on the (original) Google implementation, please see The anatomy of a large-scale hypertextual web search engine , by Sergey Brin and Larry Page.


    From dim memory, to do a search, you need to:

    • look up the word (or words) in a dictionary
    • from the dictionary you get a pointer to the list of all word appearances. (Actually, Google keeps more than one list, and it only traverses as much of the list as it needs to.)
    • lookup the document's page rank
    • rank the hits
    • lookup the document and generate the hit

    Each of the lookups (dictionary, inverted index, page rank, document) is a random access (IOP). So, to make a long story short, memory is cheaper for Google because throughput (and latency) is critical to their business and their access patterns are generally random and the cost of enough memory to hold the index is less than the comparable cost of enough disk to support the IO rates they require.

    Cheers,

    Carl Staelin
  64. PCs? by Anne+Thwacks · · Score: 1
    I am using FreeBSD to write this, but surely Google can justify using mainframes - maybe with Linux? This has to be a case for more powerful architecture.

    Does NetBSD run on IBM big iron? If not, there's always Sun kit with NetBSD.They don't have to use Linux (or DOS 4.0).

    Or maybe Google are stupid?

    --
    Sent from my ASR33 using ASCII
    1. Re:PCs? by Anonymous Coward · · Score: 0
      I am using FreeBSD to write this, but surely Google can justify using mainframes - maybe with Linux? This has to be a case for more powerful architecture.

      They do not want to have the more powerful architecture, they want to have the cheapest solution that does the job and is also flexible enough.

      Or maybe Google are stupid?

      No. They are doing clever engineering, taking into account all parameters of the equation, including money, availability of trained engineers, what is the real bottleneck (if it is network, then CPU doesn't matter ; if it is hard disk, then OS doesn't matter), cost of engineering, etc... Just rushing to buy the technically best solution without thinking IS stupid.

  65. not even close by Preposterous+Coward · · Score: 2

    Assuming 80GB drives each drawing 40 watts of power, and electricity rates at $0.20/kWh, you're looking at an annual power cost of less than $1 per gigabyte of spinning disk storage. That hardly accounts for the difference.

    --

    "Biped! Good cranial development. Evidently considerable human ancestry."
    1. Re:not even close by RGRistroph · · Score: 2

      Every watt of electricity you burn in those datacenters costs you double (at least), because you then have to pay the air conditioning to suck that heat out. At the cost there isn't in the electricity to run the AC, it's the initial and maintance cost of a bigger AC.

      Otherwise you pay much much more than that to replace failed components.

      It's unlikely that power accounts for all of google's choice, but it's total impact is mostly NOT the pure cost of the electricity.

      Some of the posters are making arguments about needing to access disks fast, and that implying a RAID which is more expensive because you need more disks, and that is probably closest to the mark.

  66. Silly people! by m.dillon · · Score: 3, Insightful

    You guys crack me up some times.

    I'll lay it out. Obviously Google is not storing the master copy of the full multi-terrabyte database in ram, but they are certainly storing as big a chunk in ram as they can, and the cost model ought to be easy for anyone to understand if you sit down and think about it.

    Consider the cost difference between the following EQUAL amounts of hard disk storage:

    * A 160GB IDE drive

    * A 160GB SCSI drive

    * Four 40GB drives in an external RAID system

    * The cost of a small medium-performance RAID
    system.

    * The cost of a larger high-performance RAID
    system scaleability to a terrabyte.

    * The cost of an *EXTREMELY* high performance RAID
    system scaleability to multiple terrabytes.

    Now consider the cost of building, say, a 40 terrabyte data store (lets not worry about backups for this experiment). If you build it out of a bunch of huge SCSI drives connected to a bunch of PC's it can be fairly cheap. But if you build out of, say, high performance EMC arrays it could cost millions of dollars more to get the same theoretical performance.

    So when you consider the cost of storage, you always have to consider the cost of the PERFORMANCE you want to get out of that storage. All the Google CEO is saying is that, Doh! It's a hellofalot cheaper to improve the performance aspects of the system by buying DRAM in a distributed-PC environment in order to be able to avoid having to purchase extremely-high performance (and extremely expensive) disk subsystems. The cost of purchasing the DRAM to make up for the lower-performing disk subsystem is actually LOWER then the cost of purchasing an equivalent higher-performance disk subsystem.

    The same is true in the ISP world. When RAM was expensive we had to rely on big whopping HD systems to scale machines up. But when RAM became cheap it turned out that you could simply throw in a very high density drive with 1/4 the performance that four smaller drives would give you, and the operating system's RAM cache would take care of the problem. Suddenly we no longer needed to purchase big whopping disk arrays.

    Think about it.

    -Matt

  67. Bad cooling? by haggar · · Score: 1

    I think you must have had problems with air-conditioning.

    Our lab has about 100 servers (mostly Sun Netra's and HP L-class), each of them has 5 drives in average, about 100 GB per server (rough average).
    That's 10 TB. This amount of storage is "active" since the beginning of last year, and we hadn't one single failure.

    --
    Sigged!
  68. Which FS on a RAM disk? by SSJ_Ramon · · Score: 1

    Ok, in cases where the general idea here is to set up a honking huge virtual disk in RAM for unbelievably fast I/O instead of using actual disks where for some reason you have to go throught the motions of disk usage, what filesystem is best? (ext2, ext3, ReiserFS, etc.)

    Would we ever need to run fsck on a RAM disk?

    --

    This .sig is void where prohibited, no purchase necessary.
    1. Re:Which FS on a RAM disk? by SSJ_Ramon · · Score: 1

      Doh, I meant for that last sentence to be:

      <joke>Would we ever need to run fsck on a RAM disk?</joke>

      --

      This .sig is void where prohibited, no purchase necessary.
  69. Dummest comment by Anonymous Coward · · Score: 0

    You have no idea. Why don't you read a book on hardware. When you finished you will realize that your comment is 'funy' to be polite.

  70. The answer is power comsumption by Anonymous Coward · · Score: 0

    The electicity needed to refresh ram is a whole lot less then to keep a hard disk turning. It takes about 9 amps to start up a HD and about 4 to keep it turning. DRAM uses power in milliamps.

  71. it the UPS fails... by rsd · · Score: 1

    just wondering the effect of power and UPS failing.

    All DRAMS being erased...

    1. Re:it the UPS fails... by dangermouse · · Score: 1
      The UPS doesn't fail.

      And by the time it runs out of power, the generators are online.

      No real datacenter relies on a rack of BackUPS Pros. :)

  72. Re:Five minute rule paper is interesting by billstewart · · Score: 2

    I liked it, even though somebody apparently thought it was redundant. It doesn't directly apply to Google, but the principles of trading off speed and cost are still relevant even though the problem's a bit different. One thing I'd find interesting is knowing how much of Google's index data is replicated - one master copy (which might be backed up on disk) kept on N search engine boxes - vs. how much do queries get spread across multiple boxes? Does it make sense to cache the spidering on disk (probably, because rerunning spidering takes a long time, and because the article caches probably don't get hit as often, and don't need the same response speed as the indexing.)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  73. To russh347 and ParisTG by dinotrac · · Score: 1

    ;0)

  74. NICE TROLL! by Anonymous Coward · · Score: 0



    The point is not that a single disk is used for parity, just that there is a disk's worth of parity being used (NOT 2:1 as YOU originally said).

    Also, he specifically said they weren't doing JBOD.

    So fuck the hell off, bitch.

  75. Re: RAID by Anonymous Coward · · Score: 0

    There IS a dedicated parity disk in RAID-3; one needs RAID-5 for spread-around parity. Modern storage that claims RAID-3 usually does RAID-5 without telling the user. (In fact, Sun T3's ("Purple") offer both RAID-3 and RAID-5, but only do RAID-5 internally.)

  76. YHBT HAND by Anonymous Coward · · Score: 0

    YHBT YHL HAND!

  77. But where do you stick the memory? by EvlG · · Score: 2

    This sounds plausible, since you can use fewer machines. But the problem I see is, where do you find a machine that can address 80+ gigabytes of memory? Otherwise, you have to but just as many commodity boxes to hold the ram, which ruins the cost benefit.

    Does anyone have any insight on what machines you would use to support this scheme? Does a SAN-type device for RAM exist? Some network-attached box that holds tens or hundreds of gigabytes of RAM?

  78. Dram chepaer in several cost but not price! by linuxislandsucks · · Score: 1

    He was pseaking of the costs in other issues not the price issue of Dram when compared to dhd disks..

    But at least the reporter could have picked up on that a specified it..where is a good tech reporter when you need one?

    --
    Don't Tread on OpenSource
  79. Yes, but... by gidds · · Score: 1
    What happens when their UPS fails?

    Hard disks are recoverable (more or less, depending on the filesystem, whether they were shut down cleanly, etc.) If it's all in DRAM, and the power goes, you've just lost decades of indexes!

    Unless you back it up on disk, of course...

    --

    Ceterum censeo subscriptionem esse delendam.

  80. I don't think you want to do that search, Dave. by Anonymous Coward · · Score: 0

    Dave: "Google, search on 'what happened to the communications'"

    Google: "Why do you want me to do that.. Dave"

    Dave: "Google, search on "How to turn communications dish to manual"

    Google: "I don't think you should do that... Dave"

  81. What did the English language ever do to you? by Nindalf · · Score: 2

    I mean, obviously you have some kind of grudge against it, to abuse it that way.

    Take the RAM out of your computer and throw it at your workmate/housemate/mum. He or she will say 'Ow!', and it's not because he or she was hit by electrons!

    This would, indeed, be the use of RAM as a mechanical object but this type of use is not characteristic. You appear to be claiming with this example that any solid object (and possibly any matter) is a "mechanical component," which is wrong and would be harmful to meaningful communication if accepted.

    Any solid object's atoms move in relation to each other. This does not mean it can be said to have "moving parts" (this useful phrase would be rendered meaningless, otherwise), or make it a "mechanical device" (ditto).

    Every electrical device is utterly reliant on its physical structure to function properly, and will cease to function properly if its structure is altered beyond certain limits. A broken connection is not a mechanical failure.

    Sure, the clip that holds it in place is mechanical, and can suffer mechanical failure, but that is not part of the RAM. To note Telstra's odd problem as evidence of RAM being subject to mechanical failure is like talking about a wind-up alarm clock being struck by lightning as evidence of such clocks being subject to electrical failure (this would, of course, actually be an electrical event causing a mechanical failure).

  82. google uses IDE (EIDE,ATA,ATAPI) by Anonymous Coward · · Score: 0

    Last year it was 6000 boxes of:

    Celeron CPU
    motherboard IDE
    2 plain IDE drives
    lots of RAM
    rackmount case

    SCSI and secondary controller cards lose big.
    You gain more by just adding another PC.

    1. Re:google uses IDE (EIDE,ATA,ATAPI) by Cramer · · Score: 1

      Translation: they are banking on cheap NODES.

      They aren't building the mother of all SANs where storage integrity and thus drive lifespan are important. It might be important for the USENET archive, but their data came from somewhere so it's not that important either.

      Cheap is a very powerful motivator in modern economics.

  83. It makes sense by Bob+Smith+157 · · Score: 0, Redundant

    Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

    1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
    2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
    3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
    4. Which leads, of course, to lower A/C bills during the warm weather.
    5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
    6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

    OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?

    --


    "It's funny. On the outside, I was an honest man. Straight as an arrow. I had to come to prison to be a crook."
  84. Do some math, not all DRAM, a mixture of both by speby · · Score: 1

    When we talk about what is "cheaper" you first have to set a standard of performance. If you want X data to always be retrieved in Y or less time, then you have a point of comparion. Memory vs Disk becomes cheaper when the number of drives you have to have to insure your level of performance becomes excessive in comparison to the amount of data the drive is storing. This is particularly true when having to index a large amount of data. If you need to do 7 or 8 disk arm seeks to get to the data and you have a standard of performance you may need many more disk than what the capacity of the platter dictates. I do not believe that either all disk or all memory is ever the best solution, but a blend is always needed. That blend goes from the traditional 1 to 100 ratio of memory to hard drive to 1 to 1. Remember the Dram stills needs back up for the most unusual power failures, hardware failures. In a well performing tranaction management system, you really don't want more than 1 or 2 physical I/Os to the hard drive for performance, which means you need intellingent indexes or hashing routines and a proper amount of memory for caching. It really is an interesting performance tuning topic. In fact some operating systems manage the difference of disk vs memory for you the programming, so you are always referencing data in memory and the OS and systems programmer are controlling how much data is really in memory for the application how much is on hard drive. A very similar concept to virtual memory for programs.

  85. instructions by Anonymous Coward · · Score: 0

    Early in your boot scripts, perhaps in a file
    called /etc/rc.local, /etc/rc.d/rc.init, or
    similar, switch to RAM. You'll need a lot.
    The script must run this only once, and it
    should be run before /proc is mounted.

    cd /tmp
    mkdir old
    mount --bind / /tmp/old
    mkdir new
    mount -t ramfs none /tmp/new
    (cd old && tar cf -) | (cd new && tar xf -)
    umount /tmp/old
    cd new
    rmdir old ; mkdir old
    pivot_root . tmp/old
    exec chroot . /bin/bash dev/console 2>&1
    telinit U
    # restart anything started before pivot_root
    # to free up the old filesystem for unmounting
    umount /tmp/old

    1. Re:instructions by Anonymous Coward · · Score: 0

      There was supposed to be a less-than character
      in that line with bash and dev/console. (note
      the lack of a slash on dev/console BTW) What you
      must do is redirect everything to dev/console.
      That means stdin, stdout, and stderr. You could
      also just log out, then log in again for the
      final umount step.

      Testing Slashdot mangling: dev
      and again, just greater-than: >dev
      and again, with less-than: dev

  86. silly silly by Anonymous Coward · · Score: 0

    silly silly people.... dont you understand how a busines works??? ok here is what they do... they buy expensive hardware and then use it to sell their product at a loss and make it up in bulk sales...

  87. Blue screen of second death by leonbrooks · · Score: 2
    Why Windows does not run off a ramdrive

    It does. But it doesn't help much and measn you have to reload the whole RAMdrive (generally over a LAN) when the box dies. Admittedly, it is a more efficient use of RAM than just handing it to Windows, since Windows (particularly the 9X stream) is a hopelessly inefficient user of RAM.

    AFAIK Linux and Open BSD cannot do this either.

    You must really have spent a lot of time and looked hard before saying that... )-:

    ``And death and hell were cast into the lake of RAM. Diskless Windows is the second death.'' -- Revelation 20:14, Geek Modified Version
    --
    Got time? Spend some of it coding or testing
  88. Cost savings by AnotherBrian · · Score: 1
    "I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"

    I would think that using solid state memory would save money over hard disks because of the cost of electricity and cooling. Those mp3 players can run for like 10 hours on a SINGLE "AA", but most hard-drive based players kill batteries faster then you can recharge them.

  89. DRAM prices by IceCreamBrain · · Score: 1

    They said that their problem right now is growth. What happens when they stop growing for a little while and there is a surplus of DRAM on the market. Some of that is going to go the consumer direction and cheap.

    --
    ~~Apathy alert: Approaching the Point of No Concearn
  90. Back to history ? by mystran · · Score: 1

    When we were still using floppy disks in PC's, nobody saw nothing wrong with first loading a database (which had to be quite small though) into the RAM and then serving it from there. Nobody would imagine more than one person waiting for a floppy disk to load something. When floppy disk was 390 MB and memory 640 you could read almost TWO FULL DISKS into memory. No one would have bought million computers with two floppy drives each just to serve some little database. And you also need to save that data there.

    Now that we have all the "fast" hard drives almost nobody keeps stuff in memory. It's not the same, but if you're hard drive is 10000 as fast as your good old floppy drive, and you have million users instead of those 10 you used to have... are you going to buy million computers ? No, you increase memory cache. At some point however there is so much "memory cache" that you can actually get some more ram and throw that slow hard drive to a "Recycle Bin".

    You save the powerbill for hard-drives, you save the powerbill for cooling, and you don't need that many machines.

    Also for reliablity. RAM fails yeah. But so does hard drives. So double the powerbill saved as nobody will be running a non-RAID hard disk for a serious server. And then compare the time wasted when copying all the data to the newly added hard drive. Yes, SCSI can do it without CPU. But you also lose performance from the disk access.

    In SERVER environment every little save counts, everything breaks, and the more of it you can have running and faster it will run.. well.. the cheaper it will be.. nobody actually cares what the hardware will cost. It will be little compared to what administration, power, spare-parts, replacement servers, whatever .. will cost in long run.

    What if slashdot did no caching ?

    --
    Software should be free as in speech, but if we also get some free beer, all the better.
  91. Ever Seen the original RAM boards? by Kramer747 · · Score: 0

    When Moffett Field (now NASA AMES Research Center) in the Bay Area had an open house many years ago, I stopped in the Computer Mueseum/Wharehouse...

    I will never forget looking at a bookshelf-sized board of ram. They were quite literally wires crisscrossed with small cheerio sized hunks of metal at each intersection. You could charge these cheerios on and off (creating 0s and 1s) by sending electricity through a wire on its x or y coordinate.

    It was soo cool. I could sit and count the number of bits on that board of ram. Imagine countin todays 128 Mb that come standard.

    Does anyone know if its still there?

  92. dram costs money, where does it come from? by squant0 · · Score: 1

    Just wondering, but where exactly does google make their money. If they own thousands of computers and have this huge pipe to the rest of the world, generators, ups systems, people working there, electricty, and running water... don't they have to make money somewhere? There are no ads that I have ever seen, but give really good info on many subjects, such as linux, and have been up for a few years now. This isn't 1998, you actually have to make money on the internet now to maintain yourself, so where does the cash flow come from? Columbia?

  93. Caching isn't that great by one-egg · · Score: 2
    The standard response to suggestions of storing data in RAM is, "That's dumb; just let the cache do the work." But it turns out that caching doesn't do nearly as well. The overheads involved (such as the cost of finding the block in the cache) make caching significantly worse than using RAM more wisely.

    You can learn a bit more about these results from our short paper (PDF) just presented at FAST, or wait for the June Usenix conference to see a longer paper.

  94. DRAM is certainly faster than disk by jehicks · · Score: 1


    Studies of database performance show that the most effective way to speed up database access it to cache it in DRAM. I think google would be quite sluggish if the index was kept on disk.

    One of the advantages that AltaVista had when it first came out was that the index was kept in DRAM. The servers at that time held 12GB of DRAM.

  95. Hey! Imagine a Beowulf... by Insightfill · · Score: 1

    ...oh, never mind.

  96. I bet..... by benzdesignz · · Score: 1

    he was comparing the cost of a lot of ram to maybe the cost of buying quite a few Seagate SCSI 15000 RPM drives on some crazy hardware RAID array. Otherwise, I have no idea how a lot of RAM is cheaper than hard disks.

  97. Nope, Google uses linux and NOT bsd by phoxix · · Score: 1
    yup yup, thats right, the following proves it so


    http://www.cs.uiuc.edu/whatsnew/abstracts/hoelzl e9 01.html


    have fun :)

    1. Re:Nope, Google uses linux and NOT bsd by Terry+Dignon · · Score: 1
      i sense some anti-freebsd sentimism in the comments page. :-)

  98. Cost Savings by behindthewall · · Score: 1

    I don't know, but surmise that the seekable data may be held in RAM. Given Google's likely loads, they're looking at a lot of load distribution. With search data in RAM, each machine can handle more load. Therefore less machines needed. The additional RAM costs less than the additional machines (and MANAGEMENT AND MAINTENANCE of those machines) otherwise needed.

  99. SSD is what it's all about. by halsaver · · Score: 1

    The answer is SSD, (solid state disk). By using a disc drive made of SDRAM, you can use the disc by all the servers. About 80% of all data traffic hits only 2-4% of your data. What is that 2-4%? Typically your database index. You can't get anything out of the database without hitting the database index first. What if you took that 2-4% and put it on something that was 250 times faster than the world's fastest RAID? You'll have taken 80% of the slow moving data requests/responses and replaced them with extremely fast data requests/responses. Think about it, the slowest piece on your entire network are your storage mechanisms...disc drives, tape drives and CD's. Ironically, besides the power supplies, these are the only mechanical devices on your network! Everything else is solid state. The key is to put the files that get hit every second of the day, like a database index on the fastest thing you can find...put the files that get hit less frequently on RAID or disc based storage..the files which get hit infrequently on tape. The second point to note is that it's not necessarily how fast a storage is in milliseconds, but how many I/O's (transactions) per second you are able to achieve. The biggest, fastest RAID systems in existence can only do less than 5000 I/O's per second. A company called Texas Memory Systems, Inc. makes a product which allows 50,000 I/O's per second from each port! With the ability to have from 2-15 ports, you can get ¾ million I/O's per second. It's not rocket science what Google is doing. Beef up the network with and SSD and everything runs faster.....your network, your RAID, and the customer responses. This is not meant to be an advertisement, however, if you have questions, please feel free to email me at halsaver@juno.com Thanks. Ric Halsaver