Google Prefers DRAM to Hard Disks
KP writes: "I came across this interview with Google's CEO. A very interesting
read." It's interesting in part becase that CEO (Eric Schmidt) claims that for Google's purposes, "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks." "I still cannot figure out how he says storing data on DRAM is
cheaper than storing it on hard-disks. Maybe, if you buy in bulk?"
I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?
When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:
With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.
Will I retire or break 10K?
I often see comments from this from people who have little experience in business.
What you pay for the initial product is not what it "costs" in the long-term. Businesses have a term for this called TCO or Total Cost of Ownership. It includes all the other time and materials needed to keep the item in use.
I would imagine in this case that the simple reason is that why DRAM is more expensive to purchase it is a *lot* less expensive to run, the primary cost being power.
Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.
[)amien
Actually, not that much of the operating system is pulled from the harddrive once the system is up. Maybe some special parts of windows like IE and other things would benefit from beeing in RAM, but not the whole C:\windows-tree.
There's an option in Win2k to not swap portions of the kernel out. If you have 128 MB of RAM or more, it's probably a good idea, too. The whole thing doesn't need to be in memory the whole time, just what you use the most.
Jack Buck (1924-2002)
Darryl Kile (1968-2002)
At my dad's work, they use a type of chip, but it's not dram. They use E^2prom. True, you do take a performance hit, but they have 10 "gig ethernet ports" on the thing. The last price quote I got was $12000 for a terabyte of this stuff. Don't forget to compare price/performance ratios to the best chipsets of IDE (or if you're a scsi bigot, SCSI). Pulling random data is very easy for chips, but HD's of ANY speed and quality are still slower.
Josh Crawley
AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.
You can do it in Linux (and probably in Windows too, though I'm not sure how)--but there generally isn't a reason to. The VM/RD cycle swings back and forth over the years, but at present the PC world seems to be running best with 2::1 VM ratio (using a chunk of HD about twice your RAM size to simulate more RAM) although part of this is that RAM is being used up by smart caching of disk. This holds for Windows, Linux, and (IIRC) Open BSD.
So, the short answer is: you could do it, but it would likely slow you down overall.
-- MarkusQ
See The Five-Minute Rule, ten years later (Word Doc) or it's HTML-ified Google Cache
Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.
In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.
Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.
I looked into using a virtual ram disk for a section of data that was being accessed quite frequently. Of course I did some reading and it turned out not to be terribly necessary.
The more memory present in the system, the more memory the linux kernel dedicates to caching. Thus commonly read files are in memory and have incredibly fast reads. This is performed auto-magically without the user even being aware of it.
Of course no two situations are exact and you may have a purpose for dedicating a ram disk to something. There are instances where you may want a fast read/response time, but the file isn't commonly used. Such as the data for a squid proxy cache. A ram disk in such a situation would be entirely helpful.
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.
-Eldurbarn
Yes, but it's all rather confusing. Read this thread in the Linux kernel mailing list if you're really interested. (WARNING: You won't understand any of it unless you know how the x86 virtual memory mechanism works.)
I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.
As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:
That's a 7.7Gb website, and that's just the text (in fact I only search for
I just think that your estimate for the cache size is a long way short of the real figure...
Simon
Physicists get Hadrons!
Its not a fair comparrison to put 1GB worth of DRAM on one side of the scale, and 1GB worth of physical storage on the other. The hard disk will obviously come out to be the cheaper of the two. However, to a company like Google who undoubtedly uses RAID technology for storage, you're effectively not getting the same "bang for your buck" as you would with a JBOD array. In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.
Mind you, thats not to say that RAID is a bad technology..heh, hardly. Its just that you cant make a 1 to 1 comparrison from DRAM to physical without taking into account the storage methods employed by each.
Cheers
Bowie J. Poag
DRAM fails all the time. In fact, DRAM is almost certainly responsible for more data corruption than disks are. DRAM gets SBE's all the time, but while when disks fail, they tend to go completely down and don't return corrupt data (which is preferably, IMHO). Of course, DRAM with ECC is significantly more reliable (and also more expensive).
Um.. they do.
AND is by default
OR is OR
NOT is -
I don't think parenthesis for grouping works though (they don't mention it), so you can't do more complex queries, but you can certainly do:
A AND (B OR C) AND !D
Which would be: A B OR C -D
When's the last time you checked your RAM? I get about 1 bad module for every 2 machines. Defects usually show up on the initial test, though some don't show up for a few years.
Don't believe me? Try it yourself; Memtest86. I suggest running one full test (can take days) when you first build a machine, and when you run into odd problems that you can't figure out. The default tests are good, but I've had times where it did miss problems.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.
These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)
Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*
If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.
Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)
Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.
As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.
Everything in the Universe sucks: It's the law!
According to "The Anatomy of Large-Scale Hypertextual Web Search Engine" by Segey Brind and Lawrence Page, the inverted index ("inverted barrels") was about 47.2Gb large (Total data without repository 55.2Gb, Repository 53.5Gb). It had about 24 Million web pages indexed. Assuming a linear increase this amounts to about 5Tb.
But, to quote from the paper:
Which is surely slightly exaggerated, but shows that they considered that there is room for improvement. (E.g using varying length index instead of fixed width)
>I dont think Linux can do it
At least they think it can do it, since they are using Linux boxes, at least accoring to
The Technology Behind Google, by Jim Reese CEO.
More than 10,000 Linux boxes, that is.
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
"It take 9 months to bear a child, no matter how many women you assign to the job."
RAM heats up as it's used, metal expands, the Chips on that little PCB stretch slightly, joints weaken with each power cycle, sometimes they fragment. The same thing with the connectors to the motherboard.
Telstra, in Australia, was having a hellish time with certain Cisco routers as the RAM heating up would eventually work it's way out of the socket, crashing the router!
With over 35 DRAM chips on the american market what good does it do to check only a single type of memory module from a single maker? .1 watts to operate they're using 1600 watts of power. While the HD server may need a peak of 500+ watts even under load it still isn't using as much as when all 8 drives spin up so it's probably only using 400 watts total for the whole system under load.
.000000012 seconds per seek. That provides 83,333,333 seeks per second. The only thing he was wrong about is that DRAM isn't 200,000 times as faster as HD for data that requires seek it's on a magnatute of Millions of times more effcient. 200,000 times is probably based on real world performance differences. based on using DRAM vs HD in a "real world" setting and not just on paper. That means to replicate the Speed of DRAM with hard drives is a futile task.
However, since I don't want to spend the rest of the day finding out the lowest power DRAM module with the highest capacity, I will assume that the best case Senario is 4GB of ram using approximately the power of two HDs of any capacity after 4GB you would require either a custom DRAM NAS/HD or a second PC. However NAS Dram with multiple gigabit ethernet ports offer the most DRAM storage per watt of electricity. Still it is at least 4x as power hungry as an 8 HD 1TB Raid server. Assuming each DRAM chip in the NAS is 64 Megabytes. To reach one terrabyte we need 16 thousand Dram chips. Obviously if each chip even requires
While it's pretty clear that power isn't an area that google can save money using DRAM over HD, and while DRAM is solid state and if it doesn't fail the first 6 months it probably wont fail in the first 100 years, it is still going to become obsolete long before it fails, requiring replacement. I've also figured that at $4 a Dram chip the cost of 1TB is $64,000 Vs $5,000 for a total package 1TB HD server. Even if you replaced the drives every 6 months it would take 15 years before the cost of materials on HDs exceeded the cost of materials on DRAM. However, there is a cost savings. First of all if you're mirroring the drives that doubles the electrical and material cost of the HD storage. Second of all that 1 GB HD server is only going to have it's seek time saturated by only 100 megabit ethernet.
Unless the data is entirely sequential (not requiring seek time) and even in the case of sequential data a single gigabit ethernet is sufficient. That Dram 1TB has at worst 12 NS latency or
Far more futile than trying to replicate the capacity of HDs with DRAM.
https://www.gnu.org/philosophy/free-sw.html
And mine, too. Actually, in case you didn't recognize it, the original poster's scenario comes directly from the Terminator series. Skynet became sentient on August 29th, 1997. (Which was, incidently, my 22nd birthday.)
--JoeProgram Intellivision!
I agree that DRAM is certainly more reliable than hard disk storage, but I should point out that a computer's power-up "memory test" is more like a "memory count" than anything. The machine says it's "testing" the memory, but it's basically paging through it to make sure it's all there. It will miss all but the most severe memory problems.
I speak from experience, as the owner of several past flaky PCs that had bad RAM, and the owner of an SGI Indigo2, which had a SIMM that would get parity errors every now and then that the POST (or whatever it's called on SGIs) would fail to detect. If you really want to test the memory, you're going to have to run some real memory-test software, which typically takes a loooong time to run (hours or days). That's because a great number of memory errors happen only slightly too frequently to be called flukes.