Google Prefers DRAM to Hard Disks
KP writes: "I came across this interview with Google's CEO. A very interesting
read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is
cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"
How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.
They make their money on hits served so speed is far more cost effective than cost of storage medium. If they can speed up serviing hits, they're ahead of the game.
I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:
With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.
Will I retire or break 10K?
Err. No.
I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.
On that basis, I echo the original query about how it can be economical to use RAM...
Simon
Physicists get Hadrons!
Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.
By the way, this guy can't speak for beans.
The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"
I often see comments from this from people who have little experience in business.
What you pay for the initial product is not what it "costs" in the long-term. Businesses have a term for this called TCO or Total Cost of Ownership. It includes all the other time and materials needed to keep the item in use.
I would imagine in this case that the simple reason is that why DRAM is more expensive to purchase it is a *lot* less expensive to run, the primary cost being power.
Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.
[)amien
That it can handle many clients with little latency... You'd have to duplicate the data across a huge number of disks to provide similar response time to clients. Sure, if you were the only client, you couldn't tell the difference but with thousands upon thousands of clients all seeking data that would be stored in different locations on a disk things would quickly grind to a halt. Because so much unrelated data is being requested, seek time is the key. Sure, memory is more expensive per meg but its ability to serve so many more clients makes it less expensive overall.
I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.
This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.
At my dad's work, they use a type of chip, but it's not dram. They use E^2prom. True, you do take a performance hit, but they have 10 "gig ethernet ports" on the thing. The last price quote I got was $12000 for a terabyte of this stuff. Don't forget to compare price/performance ratios to the best chipsets of IDE (or if you're a scsi bigot, SCSI). Pulling random data is very easy for chips, but HD's of ANY speed and quality are still slower.
Josh Crawley
If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.
You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.
I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.
Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!
-- Never make a general statement.
See The Five-Minute Rule, ten years later (Word Doc) or it's HTML-ified Google Cache
Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.
In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.
Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.
AFAIK, Google does not cache images, only HTML text. The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.
Hence the size of the cache is somewhere between 500GB and 3TB, plus the index would be another 40% of that.
My best guess is that the google archive is somewhere around a 2-3 terabytes, and that the total amount of DRAM available at google at the present time is somewhere between 5-10 terabytes.
It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.
DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.
actually google uses freebsd on their PCs
this makes more sence then:
/.ed site)
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.
If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.
And i bet they store only the most frequest used part of the index in memory.
Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a
I looked into using a virtual ram disk for a section of data that was being accessed quite frequently. Of course I did some reading and it turned out not to be terribly necessary.
The more memory present in the system, the more memory the linux kernel dedicates to caching. Thus commonly read files are in memory and have incredibly fast reads. This is performed auto-magically without the user even being aware of it.
Of course no two situations are exact and you may have a purpose for dedicating a ram disk to something. There are instances where you may want a fast read/response time, but the file isn't commonly used. Such as the data for a squid proxy cache. A ram disk in such a situation would be entirely helpful.
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
More often than not with a database your bottleneck is I/O. When you run a database you cannot have enough disks, and you cannot have enough FAST disks. In order to accomplish the kind of I/O bandwidth that a place like google is going to need you're going to need the best EMC arrays (or perhaps an IBM Shark) money can buy. And guess what? They run you megabucks. You can't just take a bunch of SCSI disks and expect them to perform as well as Fibre channel arrays. You gotta have controllers with multiple caches. Everyone who's never dealt with databases think that SCSI is the beginning and the end of hard drives, and its so far from being the truth its not funny.
I've really no idea how complex the queries are or whether or not they use a relational database but that being said its still has to hit the disk to retrieve the data and that's where every decently designed database's bottleneck is. Besides google caches all its pages. Egads! Do you have any idea how much RAM they must need for just that alone? Yes RAM is faster. Oracle even teaches you to try to keep your frequently used tables in cache anyhow, because its fastest, of course they qualify that with the word small realizing that most people don't have the gobs of memory needed to cache large tables.
Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:
1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
4. Which leads, of course, to lower A/C bills during the warm weather.
5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.
OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?
Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!
Check it out here.
KangarooBox - We make IT simple!
DRAM requires little electricity and produces almost no heat.
Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.
Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.
Conformity is the jailer of freedom and enemy of growth. -JFK
Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.
-Eldurbarn
I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.
As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:
That's a 7.7Gb website, and that's just the text (in fact I only search for
I just think that your estimate for the cache size is a long way short of the real figure...
Simon
Physicists get Hadrons!
Its not a fair comparrison to put 1GB worth of DRAM on one side of the scale, and 1GB worth of physical storage on the other. The hard disk will obviously come out to be the cheaper of the two. However, to a company like Google who undoubtedly uses RAID technology for storage, you're effectively not getting the same "bang for your buck" as you would with a JBOD array. In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.
Mind you, thats not to say that RAID is a bad technology..heh, hardly. Its just that you cant make a 1 to 1 comparrison from DRAM to physical without taking into account the storage methods employed by each.
Cheers
Bowie J. Poag
See that "mature content filter"?
How about a "mature content ONLY search"?
********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
Um.. they do.
AND is by default
OR is OR
NOT is -
I don't think parenthesis for grouping works though (they don't mention it), so you can't do more complex queries, but you can certainly do:
A AND (B OR C) AND !D
Which would be: A B OR C -D
Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.
These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)
Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*
If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.
Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)
Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.
As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.
Everything in the Universe sucks: It's the law!
Just a thought:
when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).
It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.
The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.
According to "The Anatomy of Large-Scale Hypertextual Web Search Engine" by Segey Brind and Lawrence Page, the inverted index ("inverted barrels") was about 47.2Gb large (Total data without repository 55.2Gb, Repository 53.5Gb). It had about 24 Million web pages indexed. Assuming a linear increase this amounts to about 5Tb.
But, to quote from the paper:
Which is surely slightly exaggerated, but shows that they considered that there is room for improvement. (E.g using varying length index instead of fixed width)
>I dont think Linux can do it
At least they think it can do it, since they are using Linux boxes, at least accoring to
The Technology Behind Google, by Jim Reese CEO.
More than 10,000 Linux boxes, that is.
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Google doesn't create content. They are a search engine. Nor are they in the business of archiving the net for posterity. If they lose data, it's out there to be recollected or if not, then there's no point in them saving it anyway.
Google will also likely break their technology into three components:
spidering and indexing
searching
caching
::Colz Grigor
Each of the financial analysts for the business groups responsible for each asepct of Google's technology may calculate the value of DRAM vs. HD differently. For searching, latency is extremely critical, but it's not so critical for caching, and there may be some physical problems with solely using DRAM for indexing.
That being said, I would expect Google to use HDs for spidering and indexing, DRAM for searching, and HDs for caching. Mr. Schmidt was probably only discussing technology on the most visable component of Google's technologies: searching.
I don't know how google to it. But typical the
main over head is the inverse file, for every word on every page, you just need the number of the page it was in and the word position on that byte. So the Google needs around 8-12 bytes per (non stoplisted) word.
You guys crack me up some times.
I'll lay it out. Obviously Google is not storing the master copy of the full multi-terrabyte database in ram, but they are certainly storing as big a chunk in ram as they can, and the cost model ought to be easy for anyone to understand if you sit down and think about it.
Consider the cost difference between the following EQUAL amounts of hard disk storage:
* A 160GB IDE drive
* A 160GB SCSI drive
* Four 40GB drives in an external RAID system
* The cost of a small medium-performance RAID
system.
* The cost of a larger high-performance RAID
system scaleability to a terrabyte.
* The cost of an *EXTREMELY* high performance RAID
system scaleability to multiple terrabytes.
Now consider the cost of building, say, a 40 terrabyte data store (lets not worry about backups for this experiment). If you build it out of a bunch of huge SCSI drives connected to a bunch of PC's it can be fairly cheap. But if you build out of, say, high performance EMC arrays it could cost millions of dollars more to get the same theoretical performance.
So when you consider the cost of storage, you always have to consider the cost of the PERFORMANCE you want to get out of that storage. All the Google CEO is saying is that, Doh! It's a hellofalot cheaper to improve the performance aspects of the system by buying DRAM in a distributed-PC environment in order to be able to avoid having to purchase extremely-high performance (and extremely expensive) disk subsystems. The cost of purchasing the DRAM to make up for the lower-performing disk subsystem is actually LOWER then the cost of purchasing an equivalent higher-performance disk subsystem.
The same is true in the ISP world. When RAM was expensive we had to rely on big whopping HD systems to scale machines up. But when RAM became cheap it turned out that you could simply throw in a very high density drive with 1/4 the performance that four smaller drives would give you, and the operating system's RAM cache would take care of the problem. Suddenly we no longer needed to purchase big whopping disk arrays.
Think about it.
-Matt
Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.
As someone else said, it is only a matter of time.
Dyolf Knip