Google Prefers DRAM to Hard Disks

I can see it now... by AcidDan · 2002-02-03 02:22 · Score: 2, Offtopic

In the hallowed halls of Google... Row upon row of uber-boxen with a Bagillion megabytes of ram...

Then someone trips over the power chord...

-- Dan =)

Re:I can see it now... by Egonis · 2002-02-03 03:09 · Score: 2, Funny

Actually... when I worked at Internet Direct (in Toronto, Canada) one of the NetAdmins shut down a DNS Server with his ass when he backed into a Netfinity box.

So where is your UPS NOW?
Re:I can see it now... by cloudmaster · 2002-02-04 11:02 · Score: 2

I wonder if they'd use SCSI or, snicker, IDE drives in those machines? Compare those costs, and the difference gets much less.

Not that I wanna get into an argument over which is better, but I'd bet that they'd rather have their CPUs crunching number than wasting their time waiting for an unintelligent BUS to catch up... :)
Re:I can see it now... by Shanep · 2002-02-06 00:12 · Score: 2

Guys, the UPS setup you use probably seems insignificant if the land line providers you use balls stuff up every now and then. ; )

In 1991, I was working for .au Telecom's Digital Data Network team. I was a trainee at the Haymarket exchange and we were doing some old cable removal. Thick (ish, thick if compared with UTP, thin if you're thinking inter-continental submarine data cables) cables servicing many big companies. We tended to sell leased lines and packet switched lines to large, important companies. Small companies could not afford the prices we charged for so much as a 300bps connection and ISDN we were charging crazy money for back then. So, when my boss told me to cut that fat cable, I double checked his request and then cut it with a big cable cutter (kinda like a bolt cutter, except for cables)....

A couple of banks local to Haymarket Sydney... DOWN. The Haymarket TAB (sports betting)... DOWN. Various other angry customers down too.

Man I wish I could have known what I was cutting before I did, so I could enjoy it a bit more. ; )

Plenty of guys, including myself, that night were doing unpaid overtime. The union would have loved to hear about that. DDN was always touting these incredible uptimes for their services, yet they were not really that great. That boss of mine was a real fuckwit (ex army arsehole) anyway.

--
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
Re:I can see it now... by Shanep · 2002-02-06 00:18 · Score: 2

PS, when I was working for the stock exchange, I was glad to see that the main and backup sites had computer and phone data all redundant through landline and microwave links.

A lot of Co's use microwave in au and I guess it's not just because it's cheaper in the long run!

(BTW, the ASX microwave link did go down once that I know of, when construction work between the sites had a large crane sometimes blocking the line of sight.)

--
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?

Additionally by Phosphor3k · 2002-02-03 02:24 · Score: 4, Insightful

How often do you see DRAM fail compared to Hard Disks? A bit more reliability IMHO.

Re:Additionally by LWolenczak · 2002-02-03 02:32 · Score: 2

I don't think I have ever seen DRAM fail, but I sure have seen my share of both ide and scsi drives die.
Re:Additionally by sammy+baby · 2002-02-03 02:43 · Score: 2

Exactly what I was going to say. DRAM has the "no moving parts thing" on its side, which is a pretty powerful bennie, if you ask me.
Re:Additionally by LWolenczak · 2002-02-03 03:02 · Score: 2

We had a scsi drive that died due to it's circuit board going south.....
Re:Additionally by VAXman · 2002-02-03 04:16 · Score: 4, Informative

DRAM fails all the time. In fact, DRAM is almost certainly responsible for more data corruption than disks are. DRAM gets SBE's all the time, but while when disks fail, they tend to go completely down and don't return corrupt data (which is preferably, IMHO). Of course, DRAM with ECC is significantly more reliable (and also more expensive).
Re:Additionally by darkwhite · 2002-02-03 04:43 · Score: 3, Insightful

Very often. And the problem is, unlike hard drives, which will try their best not to return the data if they have a hint that it's corrupted (meta-data, checksums, etc.), DRAM will be more than happy to return the incorrect data, which then might get written to disk. Some of the errors I've seen due to corrupt DRAM are pretty amusing.

--

[an error occurred while processing this directive]
Re:Additionally by Spoing · 2002-02-03 04:57 · Score: 2, Informative

RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.
When's the last time you checked your RAM? I get about 1 bad module for every 2 machines. Defects usually show up on the initial test, though some don't show up for a few years.
Don't believe me? Try it yourself; Memtest86. I suggest running one full test (can take days) when you first build a machine, and when you run into odd problems that you can't figure out. The default tests are good, but I've had times where it did miss problems.

--
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Re:Additionally by haruharaharu · 2002-02-03 05:41 · Score: 2

RAM is a mechanical device

Ram is an electronic device. It has no mechanical parts, save for the junction between it and the motherboard.

--
Reboot macht Frei.
Re:Additionally by Hal-9001 · 2002-02-03 05:49 · Score: 3, Informative

RAM is a mechanical device; even though it doesn't have joints and piviot points, the parts it does have do move and do wear out.
RAM is not mechanical, it's capacitive, i.e. it operates by storing charge. One of the advantages of semiconductor, or solid-state, electronics over pre-transistor electromechanical relays and vacuum tubes is that they require no moving parts, making them more rugged and reliable.
Defects usually show up on the initial test, though some don't show up for a few years.
A curious thing about solid-state electronics is that a large number of parts fail initially, then the failure rate is constant for several years, and then the failure rate increases again. This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days. Contrast this with mechanical failure, which continually increases with time.

--
"It take 9 months to bear a child, no matter how many women you assign to the job."
Re:Additionally by Blind+Lemon · 2002-02-03 05:56 · Score: 2, Interesting

With hard disks you have things like RAID to protect against disk failure. No such thing with RAM. Sure, you can get protection from a bit going bad, but not for loosing a chip.
The company I work for makes computers with a lot of RAM and so we've been researching how to survive a RAM chip failure, but as far as I know no system implements such a technology.
Re:Additionally by roguerez · 2002-02-03 06:04 · Score: 2

This is why electronics like CPUs and DRAM usually have a warranty of 30 days, because 99.9% of parts that are going to fail do so in 30 days
This makes no sense. A long warrenty period makes a product sell better. When 99.9% of parts that are going to fail do it in 30 days, it's in the interest of the manufacturer to either have no warrenty at all or a very short one (to prevent claims), or one that is very long, like 10 years or lifetime. After the first 30 days, hardly anything is going to break, so it would be stupid not to prolong the warrenty period. This can be done essentially 'free'. And I've seen RAM that have a lifetime guarantee.
Re:Additionally by lkaos · 2002-02-03 06:16 · Score: 2

What?

RAM is solid state. It is simple a circuit board with a couple of IC modules. There are absolutely no moving parts.

The reason RAM goes bad is chiefly from operating temperatures and poor construction (mostly impurities in the air).

There are absolutely no moving parts in RAM though. That is just silly to even suggest :)

In fact, the only real moving parts in most PC's are the storage devices and fans...

--
int func(int a);
func((b += 3, b));
Re:Additionally by Defiler · 2002-02-03 06:49 · Score: 3, Interesting

IBM sells this technology. They call it ChipKill.
Perhaps this is what your company is looking for:
ChipKill
Re:Additionally by Chmarr · 2002-02-03 06:58 · Score: 3, Informative

Ram has both an electronic component, and mechanical. Try this experiment: Take the RAM out of your computer and throw it at your workmate/housemate/mum. He or she will say 'Ow!', and it's not because he or she was hit by electrons!
RAM heats up as it's used, metal expands, the Chips on that little PCB stretch slightly, joints weaken with each power cycle, sometimes they fragment. The same thing with the connectors to the motherboard.
Telstra, in Australia, was having a hellish time with certain Cisco routers as the RAM heating up would eventually work it's way out of the socket, crashing the router!
Re:Additionally by SilentChris · 2002-02-03 07:04 · Score: 3, Insightful

I've seen a lot of "logic" arguments to this post, but I think people are missing a sort of obvious one: size. If you had enough RAM as an average hard drive (say, 20 gigs) I'm sure that at least *one* piece would be faulty. You're comparing, in a best-case server scenario, a gig of RAM vs. a 80-gig hard drive. I think if the numbers were even it'd be a "fairer" fight.
Re:Additionally by dstone · 2002-02-03 08:34 · Score: 2

Yes, you should compare 80 gigs of HD versus 80 gigs of DRAM. First of all, you'll usually detect any DRAM faults upon your first powerup test (while it's still under warranty and, more importantly, no data has been trusted to it yet). Okay, so down the road now, DRAM really isn't very sensitive to wear-and-tear. It is, but not nearly to the degree of stepping motors, spinning platters, and crashing heads that need cooling and lubrication. And consider this benefit... if a fault is detected on one chip of an 80 gig cluster of DRAM, you can swap one chip, not the whole 80 gigs. (Either way, it'll likely require a power-cycle and data restore from backup though.)
Re:Additionally by alex_ant · 2002-02-03 09:10 · Score: 2, Informative

I agree that DRAM is certainly more reliable than hard disk storage, but I should point out that a computer's power-up "memory test" is more like a "memory count" than anything. The machine says it's "testing" the memory, but it's basically paging through it to make sure it's all there. It will miss all but the most severe memory problems.
I speak from experience, as the owner of several past flaky PCs that had bad RAM, and the owner of an SGI Indigo2, which had a SIMM that would get parity errors every now and then that the POST (or whatever it's called on SGIs) would fail to detect. If you really want to test the memory, you're going to have to run some real memory-test software, which typically takes a loooong time to run (hours or days). That's because a great number of memory errors happen only slightly too frequently to be called flukes.
Re:Additionally by Hal-9001 · 2002-02-03 10:49 · Score: 2

The OEM's to whom CPUs and DRAM are usually sold to know that 99.9% of parts are going to fail in 30 days--there's no point in trying to sway them with a longer warranty...

--
"It take 9 months to bear a child, no matter how many women you assign to the job."
Re:Additionally by Hal-9001 · 2002-02-03 10:55 · Score: 2

CPUs are the exception in this case. In general, you don't want solid-state electronics running so hot that fatigue due to thermal expansion is a factor.

P.S. Nice attempt to make the RAM a moving part, but that doesn't mean it has moving parts... :-p

--
"It take 9 months to bear a child, no matter how many women you assign to the job."
Re:Additionally by Chmarr · 2002-02-05 19:07 · Score: 2

Oh... don't misunderstand me. I'm not trying to pretend that RAM Is in any way a mechanical device like, say, a fan or harddisk is. I'm only saying that to say that RAM does not suffer from mechanical problems is incorrect... albeit doing it in a funny-ha-ha kinda way :)

RAM vs. HDD by hitchhacker · 2002-02-03 02:29 · Score: 2, Redundant

If google has something like 10,000 linux PC's, I would definately think that using RAM and a ramdisk for the rootpartition would be cheaper than putting a hard drive in every PC. I would imagine that the hard drives would be the first to go if something failed.
Obviously, if they used DRAM for their HUGE central databases, it would not be a cheaper solution.
But, I'm talking out of my ass, because I don't know how their datacenter works.. anyone anyone?

-metric

Re:RAM vs. HDD by Anonymous Coward · 2002-02-03 03:08 · Score: 3, Interesting

actually google uses freebsd on their PCs
Re:RAM vs. HDD by Anonymous+DWord · 2002-02-03 06:25 · Score: 2

Yup. It's a pretty-tweaked version of RedHat.

--
"If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden

Speed saves by coreman · 2002-02-03 02:30 · Score: 3, Insightful

They make their money on hits served so speed is far more cost effective than cost of storage medium. If they can speed up serviing hits, they're ahead of the game.

From the article: Why DRAM is so fast by yerricde · 2002-02-03 02:30 · Score: 5, Informative

I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

When you pay for DRAM, you get read latency measured in nanoseconds rather than milliseconds, which lets you get more queries done faster with less processing hardware. The key metric here is seeks per second. From the article:

Schmidt: "it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks -- which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast."

With a rotating disk, if you wanted to access a million different pieces of data, you would have to either wait for a million seeks or set up a 1,000-way mirror and wait for 1,000 seeks. Because DRAM seeks several orders of magnitude more quickly, you don't need as many mirrors of the data to get the same number of seeks per second.

--
Will I retire or break 10K?

Re:From the article: Why DRAM is so fast by jackb_guppy · 2002-02-03 02:46 · Score: 4, Interesting

A simpler way of saying this:

Do you want to buy a machine that cost $100,000 per copy to do 1 Million Hits per X time.

-or-

Do you want to buy 1000 machines that cost $500 per copy to do 1000 Hits per X time.

In both cases we are talking about 1 million Hits per X time.

In case 1 - it costs a port on master switch and $100,000 for the machine.

In case 2 - it costs 1000 ports on master switch -- actually more switches and infrastructure. AND $500,000 for the machines.

Case 1 20% Cheaper then case 2. We have not talked of Power, A/C, Space... Need to look at the whole picture.
Re:From the article: Why DRAM is so fast by dillon_rinker · 2002-02-03 10:23 · Score: 2

Case 1 20% Cheaper then case

MATH ERROR! MATH ERROR!

"A is X% cheaper than B" in English translates to:

A = B - B * X / 100

Or, take 20% of $500,000, subtract it from 500,000, and that's something that's 20% cheaper.

Your statement would have been more accurate as follows:

"Case 1 80% Cheaper then case 2" [sic]

It would have had much more impact to say this:

"Case 2 is 400% more expensive than case 1."

Re:Cost v Speed by Space+cowboy · 2002-02-03 02:31 · Score: 5, Interesting

JohnHegarty scribbled

I am sure the google archive is only a few 100gb

Err. No.

I maintain a tiny search engine (some 5000 sites), with the data cached locally, just like Google. It takes ~250Gb of disk space for that miniscule cache. The one at Google must be of the order of a few hundred Terabytes, not Gigabytes.

On that basis, I echo the original query about how it can be economical to use RAM...

Simon

--
Physicists get Hadrons!

I've always wondered by Lord+Hugh+Toppingham · 2002-02-03 02:32 · Score: 2

Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?

AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.

Re:I've always wondered by MarkusQ · 2002-02-03 02:42 · Score: 2, Informative

Why windows does not run off a ramdrive. I mean, modern PCs all have at least 512MB ram, why not load up Windows once, and then never access the disk drive again?
AFAIK Linux and Open BSD cannot do this either. It seems amazing to me that people have missed this idea.
You can do it in Linux (and probably in Windows too, though I'm not sure how)--but there generally isn't a reason to. The VM/RD cycle swings back and forth over the years, but at present the PC world seems to be running best with 2::1 VM ratio (using a chunk of HD about twice your RAM size to simulate more RAM) although part of this is that RAM is being used up by smart caching of disk. This holds for Windows, Linux, and (IIRC) Open BSD.
So, the short answer is: you could do it, but it would likely slow you down overall.
-- MarkusQ
Re:I've always wondered by Cylix · 2002-02-03 03:11 · Score: 3, Informative

I looked into using a virtual ram disk for a section of data that was being accessed quite frequently. Of course I did some reading and it turned out not to be terribly necessary.

The more memory present in the system, the more memory the linux kernel dedicates to caching. Thus commonly read files are in memory and have incredibly fast reads. This is performed auto-magically without the user even being aware of it.

Of course no two situations are exact and you may have a purpose for dedicating a ram disk to something. There are instances where you may want a fast read/response time, but the file isn't commonly used. Such as the data for a squid proxy cache. A ram disk in such a situation would be entirely helpful.

--
"You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
Re:I've always wondered by jc42 · 2002-02-03 03:23 · Score: 2, Interesting

Huh? Go to handhelds.org and look at the specs for the various linux handhelds. Few if any of them have hard disks; everything is run out of memory. This doesn't seem to have been much of a problem with linux (or any of the unix clones). A "ramdisk" isn't exactly a new concept in the unix environment.

In fact, this sort of trick was exactly why the unix "block device" abstraction was invented more than a quarter century ago. It allows you to have a file system on anything that can store data in addressable chunks called "blocks". Memory works just fine for this.

An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa). This causes most apps' temp files to be in main memory, and eliminates rotational delays for these files.

There's no real problem with mapping the entire file system to memory.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:I've always wondered by tshak · 2002-02-03 05:27 · Score: 2

Caching the entire Kernal and commonly used DLL's is supported in WinXP (Pro, not sure about Home). I believe there is undocumented support in Win2K but I have not verified this. A friend of mine built a machine with 512MB of RAM and put XP on it and enabled this "cache" feature. Although the boot time was a little (barely noticeable) slower, the load time of apps and common tasks was incredible - almost as if you were using a solid-state device (a PDA, for example).

--

There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
Re:I've always wondered by haruharaharu · 2002-02-03 05:46 · Score: 2

An old trick for speeding up unix systems has been to use memory for the /tmp directory (and symlink /usr/tmp to /tmp, or vice-versa).

This was because SunOS had a dog-slow filesystem; even today, /tmp is usually backed by ram. Linux (and probably BSD) has a fast enough filesystem that this isn't an issue

--
Reboot macht Frei.
Re:I've always wondered by psamuels · 2002-02-03 16:33 · Score: 2

This was because SunOS had a dog-slow filesystem; even today, /tmp is usually backed by ram. Linux (and probably BSD) has a fast enough filesystem that this isn't an issue

There's also the small matter of write-back caching - any modern OS should cache writes aggressively (or at least should have the option), such that short-lived temp files (you know, the ones whose speed matters most) usually never reach the platter before being deleted anyway.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
Re:I've always wondered by jafac · 2002-02-04 07:43 · Score: 2

There IS a registry hack somewhere for NT/2k that supposedly "keeps the OS in RAM for faster performance"

It actually works.

http://www.winguides.com/registry/display.php/39 9/

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

Scary! by Anonymous Coward · 2002-02-03 02:32 · Score: 4, Insightful

Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.
Now if only Google could go out and do its own fact-checking, it wouldn't need to rely on other newspapers at all. Mark my words, by 2010 google will be the only place you go when you need information. Forget askjeeves, try listentogoogle. No humans will be involved. Scary.

By the way, this guy can't speak for beans.
The speech I give everyday is: "This is what we do. Is what you are doing consistent with that, and does it change the world?"

Re:Scary! by Phosphor3k · 2002-02-03 02:44 · Score: 5, Funny

The system goes on-line on August 4th, 1997. Human decisions are removed from strategic searching. Google begins to learn, at a geometric rate. It becomes self-aware at 2:14 am, eastern time, August 29th. In a panic, they try to pull the plug.

Google fights back.
Re:Scary! by Fissure_FS2 · 2002-02-03 05:34 · Score: 2, Funny

Just my luck. Our favorite search engine takes over the world on my birthday.

I can imagine it now: just as I am about to blow out the candles, a giant DRAM chip bursts out of the cake and says, "I am Google. I am here to protect you. I am here to protect you from the terrible secret of space... er, the web."

--
My life's goal is to get a score of +3!
Re:Scary! by Mr+Z · 2002-02-03 08:32 · Score: 2, Informative

And mine, too. Actually, in case you didn't recognize it, the original poster's scenario comes directly from the Terminator series. Skynet became sentient on August 29th, 1997. (Which was, incidently, my 22nd birthday.)
--Joe

--
Program Intellivision!

Once again a simplistic view by damieng · 2002-02-03 02:33 · Score: 3, Informative

I often see comments from this from people who have little experience in business.

What you pay for the initial product is not what it "costs" in the long-term. Businesses have a term for this called TCO or Total Cost of Ownership. It includes all the other time and materials needed to keep the item in use.

I would imagine in this case that the simple reason is that why DRAM is more expensive to purchase it is a *lot* less expensive to run, the primary cost being power.

Also consider that if speed is of essence, as it with Google, it's not 50GB or RAM vs a 50GB cheap-n-cheerful IDE drive. A 50GB Ultra160 drive costs considerably more than an IDE and still won't come near the DRAM for speed.

--
[)amien

Re:Once again a simplistic view by NNKK · 2002-02-03 02:42 · Score: 2, Insightful

Stack reliability, as someone else mentioned, on top of power and speed savings.

Personaly I seriously doubt that all or even close to all of the stuff google stores is stored in DRAM, it's more likely they'd keep newer data and high-access data in DRAM, and older stuff gets archived to disk, avalible for recall later, but slower.
Re:Once again a simplistic view by Alomex · 2002-02-03 02:54 · Score: 2

Personaly I seriously doubt that all or even close to all the stuff google stores is stores in DRAM

You better believe it. Altavista already did that a long time ago. Hotbot (inktomi) had a similar all-in-memory scheme. Since Google is faster than those two, all the more reason to believe that the data is in DRAM (although surely they have backups in HDs and tape, but that is a different story).

Re:Cost v Speed by PhotoGuy · 2002-02-03 02:36 · Score: 2

I am sure the google archive is only a few 100gb

Huh? I would have thought it would have been between 10x to 100x that much. Especially if they cache most pages. (Maybe they just use dram for the indexes, and hd's for the cache?)

I still don't understand that claim. $300 will get me a 160G drive, and I can load four of them in a cheap PC case or 1U rackmount case, 640G per unit. That's under $2K for .64 Terabyte.

RAM prices vary wide, but say on the low side I can get 256M for $20. I'd need 2560 sticks of 256M to equal 640G, or $51,200 for the equivalent storage. And that doesn't take into account that most reasonably priced PC motherboards only handle 2G or 4G of memory these days. You'd need 160 motherboards in the best case, adding $80,000 to the cost, assuming you could get 4G per unit, and $500 per motherboard/chassis. Let's, see $51K+80K = $131K, versus $2K.

RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).

Either their archive is a lot smaller than I assumed, or they're talking performance/price tradeoffs, where speed has a high premium.

-me

--
Love many, trust a few, do harm to none.

The key to it being cheaper is.... by rayd75 · 2002-02-03 02:36 · Score: 3, Insightful

That it can handle many clients with little latency... You'd have to duplicate the data across a huge number of disks to provide similar response time to clients. Sure, if you were the only client, you couldn't tell the difference but with thousands upon thousands of clients all seeking data that would be stored in different locations on a disk things would quickly grind to a halt. Because so much unrelated data is being requested, seek time is the key. Sure, memory is more expensive per meg but its ability to serve so many more clients makes it less expensive overall.

Re:Cost v Speed by DrXym · 2002-02-03 02:39 · Score: 2

A few 100gb to cache the entire internet?

Imperial MegaRam? by Ben+Jackson · 2002-02-03 02:39 · Score: 4, Interesting

They may be referring to Imperial Technology's MegaRam solid state disks (SSDs). They claim about 36,000 IO/sec. Compare that with 80-120 IO/sec on a typical SCSI drive. I'm pretty sure that eBay is using them.

I had an opportunity to play with one on a 20 CPU Starfire domain and it was pretty impressive. The unit I was using had 8 wide SCSI ports on it, which were all connected. Interestingly, when the system was pegged, it was off the scale in system time. There's probably a locking problem in the Solaris kernel that's the real bottleneck.

Re:Imperial MegaRam? by ottffssent · 2002-02-03 14:10 · Score: 2

100-120 IO/sec? These SCSI drives you're talking about are several years old.

Storagereview reports 120 IO/sec for Western Digital's top IDE drive, the WD1200BB. See Storage Review. A top-end SCSI drive such as the Seagate X15-36lp performs on the order of 360 IO/sec. See Storage Review.

For the interested, the X15 runs about $14/G, and you would need about 100 drives to equal the IO/sec of the RAM drive. That's $1400/G; minimum $60,000, and about 700W power consumption and about 2kW total when you add cooling to that. High quality PC2100 DDR is in the $550-600 range per gig, and about 10 watts after cooling.

--
High-speed Road Trip (18.000KPH)

Fewer servers needed by michaelmalak · 2002-02-03 02:39 · Score: 5, Interesting

I still cannot figure out how he says storing data on DRAM is cheaper than storing it on hard-disks. Maybe, if you buy in bulk?

Google's Eric Schmidt probably means that fewer replicated servers are needed. If we take his stat of 200,000x speedup at face value, then you would need 200,000 times as many hard-drive-based servers as DRAM-based servers. There are many other factors involved such as communication delays and scalability, but you get the idea.

This just shows how limited the lifespan is of 32-bit 4GB architecture, especially for servers.

Re:Fewer servers needed by ErikZ · 2002-02-03 03:32 · Score: 2

I want to know HOW they are doing this. Are they using PIIIs with 64GB of memory?

--
Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
Re:Fewer servers needed by The+Smith · 2002-02-03 03:46 · Score: 2, Informative

Yes, but it's all rather confusing. Read this thread in the Linux kernel mailing list if you're really interested. (WARNING: You won't understand any of it unless you know how the x86 virtual memory mechanism works.)

I believe it... by josh+crawley · 2002-02-03 02:41 · Score: 3, Informative

At my dad's work, they use a type of chip, but it's not dram. They use E^2prom. True, you do take a performance hit, but they have 10 "gig ethernet ports" on the thing. The last price quote I got was $12000 for a terabyte of this stuff. Don't forget to compare price/performance ratios to the best chipsets of IDE (or if you're a scsi bigot, SCSI). Pulling random data is very easy for chips, but HD's of ANY speed and quality are still slower.

Josh Crawley

RAM Disks by buckrogers · 2002-02-03 02:43 · Score: 3, Interesting

If they made a 2GB RAM Drive in each of their 10,000 machines then that would be 20 TB of storage. This seems sufficient to me for most storage needs.

You would still need to be able to direct searches to the machines that have the part of the data you need. This would take a high speed network and some clever programming. But it is doable.

I always was amazed at the speed of googles search engine, now I have a little more clue as to why it is so fast.

Sounds to me like they might be able to sell their database software as a money making product at some point. Oracle, watch out!

--
-- Never make a general statement.

Re:RAM Disks by epsalon · 2002-02-03 03:00 · Score: 2

20TB is peanuts for a search engine the size of Google. Google's needs are closer to 500TB, or even a few PB. Don't forget the cached pages and the usenet archive! These stuff should take at least a few PB.

--

Make even shorter URLs - 8LN.org
Re:RAM Disks by graxrmelg · 2002-02-03 03:23 · Score: 3, Insightful

Google doesn't need petabytes of storage. Right now they claim 2 billion Web pages, 700 million Usenet messages, and 330 million images. That's a total of 3 billion things. Let's wildly overestimate their average size as 100K (remember that the Usenet archive doesn't include binaries). The storage space required would be 3e9 * 1e5 = 3e14, or 300 TB.

It's probably true that 20 TB isn't enough for Google, but it's not true (and won't be for quite a while) that the cached pages and Usenet archive require "a few PB".
Re:RAM Disks by buckrogers · 2002-02-03 04:04 · Score: 2

Guess what? Google doesn't cache images! And I bet they compress the cached page too.

So, let's get wild and say that there is 120TB of html pages that we care about... if you compress these pages then they would fit in 10 TB. Still plenty of room on a 20 TB RAM Disk for the index to all these pages.

And besides, I'm just guessing... They might have 8GB of RAM in every machine, for all I know.

--
-- Never make a general statement.
Re:RAM Disks by um...+Lucas · 2002-02-03 17:19 · Score: 2

It doesn't look like they're storing the image at all... Just the text around the image for the search, and then the results page is actually pulling the image from it's originating server...
Re:RAM Disks by Perdo · 2002-02-08 11:55 · Score: 2

They don't sell it, they licence it. That is Google's primary source of income.

--
If voting were effective, it would be illegal by now.

Five minute rule by NearlyHeadless · 2002-02-03 02:45 · Score: 3, Informative

The raw cost of DRAM ($/MB) is still much higher, but that is not the complete analysis. Database god Jim Gray's analysis shows that you should keep data in memory if it is going to be accessed every five minutes or less.

See The Five-Minute Rule, ten years later (Word Doc) or it's HTML-ified Google Cache

price comparison by karmma · 2002-02-03 02:46 · Score: 4, Informative

Reasonably priced DRAM goes for about $250/gig; a reasonably priced SCSI RAID setup goes for about $10/gig.

In order to say that the DRAM option is cheaper than the hard drive option, the performance of the DRAM option would have to exceed the performance of the DRAM option by a factor of greater than 25. If you do the math, it's possible.

Years ago, I worked in a VAX shop that used RAM drives for some installed/shared images that required high concurrency. The performance was impressive - and was factored into the overall cost analysis of the purchase.

Re:price comparison by bdolan · 2002-02-03 03:05 · Score: 2, Insightful

If you have heavily hit database indexes, i.e. google, then you may need 100-1000x fewer machines. The cost of the disks is not the important cost, it is the far fewer number of machines for an equivalent query rate. However, you want to have far more than 2gb of directly addressed ram per machine--in fact at current prices it is probably cost effective to put 100's of gb per machine if you need to keep the query ram based--even if the CPUs are dwarfed in cost by the ram.

This is one of the reasons that we need 64 bit addressability on commodity IA architecture ASAP -- Ram drives using an IO subsystem adds a huge overhead compared to indexing in arrays and natural data organization as opposed to fixed blocks of byte that have to be retrieved as a unit with 100s++ of instructions and security models in the way of access!
Re:price comparison by darkwhite · 2002-02-03 04:49 · Score: 2

$250/gig? That's not reasonably priced. I think PC133 DRAM can cost as low as $125/gig in bulk now...

--

[an error occurred while processing this directive]
Re:price comparison by Reziac · 2002-02-03 05:32 · Score: 2

It's gone back up a bit since then, but last December, Star Components (www.star-components.com) was selling PC133 DIMMs at $55/gig. Newer RAM types were somewhat higher, but nowhere near $250/gig.

--
~REZ~ #43301. Who'd fake being me anyway?
Re:price comparison by haruharaharu · 2002-02-03 05:56 · Score: 2

I just bought a Gig of DDR ECC ram for $150 from compsource, so there's a datapoint for you.

--
Reboot macht Frei.
Re:price comparison by SEE · 2002-02-05 07:58 · Score: 2

This is one of the reasons that we need 64 bit addressability on commodity IA architecture ASAP

Sledgehammer is coming. Sledgehammer is coming.

Re:Cost v Speed by Alomex · 2002-02-03 02:48 · Score: 3, Insightful

AFAIK, Google does not cache images, only HTML text. The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.

Hence the size of the cache is somewhere between 500GB and 3TB, plus the index would be another 40% of that.

My best guess is that the google archive is somewhere around a 2-3 terabytes, and that the total amount of DRAM available at google at the present time is somewhere between 5-10 terabytes.

Re:Cost v Speed by andykuan · 2002-02-03 02:58 · Score: 4, Insightful

It's important to note, though, that he states DRAM is more efficient (cost-wise? speed-wise? whatever) when it comes to storing seekable data. I wonder if that means they're using DRAM for their search indices and plain old disk for their cached content. DRAM is ideal for completely random access to multiple pieces of data, whereas disk does okay for serial access to data, the location of which is well known.

A number of reasons it could be "cheaper"... by AtariDatacenter · 2002-02-03 02:58 · Score: 2

Maybe he's talking in terms of TCO (total cost of ownership). Over its lifetime, RAM costs less than its hard drive counterpart?

Another point... as long as you don't store you METADATA 100% in RAM, you can store at least your data (cached web pages) in RAM. What happens if it gets dumped? Simple. Just respider the pages you lost and go on. Small amounts of data loss can be covered.

Okay. It may sound like I'm talking out of my ass because I am. It is really hard to cover for a statement like that. But lets talk again on the performance angle that has been covered (but with a little more emphasis on RAID disks).

You *may* be able to get better cost/performance with LOCAL memory (not ram-based drives) than you could with a RAID array. And a raid array could never equal the performance you get with local memory. Of course, local memory could never reach the storage you achieve with a raid array. So these two paths seem to diverge (bulk storage vs speed) when comparing local DRAM to RAID'd disks.

His statement MAY make sense, but it would have to be put into a larger context. (RAM is better than disk in X circumstances.)

Re:Cost v Speed by Yokaze · 2002-02-03 03:03 · Score: 2

I think he (Eric Schmidt) spoke of storing the indices.
Traditionally, they are only stored partially in RAM due to their size.

Certainly, the unprocessed pages are still stored on HDs as one doesn't gain
anything from storing them in RAM.

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"

Re:Hard disk is an obsolete technology by __aaaaxm1522 · 2002-02-03 03:06 · Score: 2

Look at PDAs / handheld PCs. They use flash memory, albeit out of necessity (price, power consumption, size, etc)... but we're already beginning to see laptops incorporate solid state storage technologies. It's only a matter of time.

Now, if we could just get around that pesky limited-write lifetime ... ;)

The latest 2600 mag... by AltGrendel · 2002-02-03 03:07 · Score: 2

...has an article on this very subject. The listed article "How to hack from a RAM disk" is what you're looking for.

--
The simple truth is that interstellar distances will not fit into the human imagination

- Douglas Adams

Something Nobody's Mentioned by Guppy06 · 2002-02-03 03:08 · Score: 4, Interesting

DRAM is probably much cheaper than hard drives in the sense of their electricity bill. Think of how many nodes their clusters have and then imagine each of them each having at least two hard drive motors spinning 24/7.

Re:Cost v Speed by leuk_he · 2002-02-03 03:09 · Score: 5, Interesting

this makes more sence then:
PC World: What are Google's biggest challenges?
Schmidt: Managing the growth. Our servers are overloaded. There is a DRAM shortage. We're building more computers. We are adding more-sophisticated products to the advertising side of Google. Our problems at the moment are growth problems.

If you have computers where 4 GB is not very much memory, but use the amount we use on out HD for memory i would have a dram shortage too.

And i bet they store only the most frequest used part of the index in memory.

Did you notice when you access the google cache this very slow compared to a search? Even if that cache was accessed frequently (because it references a /.ed site)

Bottlenecks... by percey · 2002-02-03 03:12 · Score: 3, Insightful

More often than not with a database your bottleneck is I/O. When you run a database you cannot have enough disks, and you cannot have enough FAST disks. In order to accomplish the kind of I/O bandwidth that a place like google is going to need you're going to need the best EMC arrays (or perhaps an IBM Shark) money can buy. And guess what? They run you megabucks. You can't just take a bunch of SCSI disks and expect them to perform as well as Fibre channel arrays. You gotta have controllers with multiple caches. Everyone who's never dealt with databases think that SCSI is the beginning and the end of hard drives, and its so far from being the truth its not funny.

I've really no idea how complex the queries are or whether or not they use a relational database but that being said its still has to hit the disk to retrieve the data and that's where every decently designed database's bottleneck is. Besides google caches all its pages. Egads! Do you have any idea how much RAM they must need for just that alone? Yes RAM is faster. Oracle even teaches you to try to keep your frequently used tables in cache anyhow, because its fastest, of course they qualify that with the word small realizing that most people don't have the gobs of memory needed to cache large tables.

Re:Bottlenecks... by Wesley+Felter · 2002-02-03 07:55 · Score: 2

Actually, I've read that Google uses legions of machines with a few IDE drives each. The Wayback Machine uses similar hardware. Keep in mind that these are custom applications, not off-the-shelf databases, so they are written with shared-nothing clusters in mind.

More importantly than the DRAM... by LatJoor · 2002-02-03 03:15 · Score: 2, Insightful

Although it's not mentioned in the Slashdot writeup, I think that probably the most important part of this interview was the discussion of Google's business model and future. It's good to see that they're committed to not getting in over their heads with extraneous services. They've found a business model that works and they're sticking to it, rather than getting greedy and adding dumb new services that have nothing to do with searching, or "search," as he put it.

A lot of technology companies would do very well to follow Google's example, it seems to me. They're proving that Internet services are a perfectly sound venture if the company has a sensible business model and always keeps focused on providing quality technology and services in the area that they know best.

Pretty amazing, but I can see it. by dinotrac · 2002-02-03 03:16 · Score: 5, Insightful

Lots of other posters have mentioned pieces of the puzzle, so I risk being redundant here. But, it seems the whole equation goes something like this:

1. If each box only handles a part of the web, it is possible that most of the space on it's drive (or drives) are wasted anyway.
2. If disk latency means that cpus spend idle time, eliminating that latency means more throughput per box, hence fewer boxes. More money spent on DRAM, less money spent on CPU, power supplies, etc.
3. Even with same number of boxes, lower power draw, smaller and/or fewer UPS(s) required. With fewer boxes, even more reduction.
4. Which leads, of course, to lower A/C bills during the warm weather.
5. Fewer boxes, fewer pieces, whatever, means fewer things breaking. The impact of a single outage may be greater, but, from the cost standpoint, you need fewer man-hours to manage the outages, fewer spare-parts, etc.
6. Lower medical expenses from sysadmins going insane due to the noise from all those drives and the associated larger power supplies and extra cooling fans.

OK, that last item is a stretch, but how many sysadmins are more than a step from insanity anyway?

Re:Pretty amazing, but I can see it. by russh347 · 2002-02-03 05:15 · Score: 2, Funny

how many sysadmins are more than a step from insanity anyway?

Absolutely none.

Overview of Today's Headlines by Corrado · 2002-02-03 03:16 · Score: 4, Insightful

Another service that takes advantage of recency is something we just added called Overview of Today's Headlines. Google reads all the newspapers on the Web every hour and constructs a newspaper for the world by computer--no humans are involved.

This is a pretty cool idea. I only hope they make a RSS feed out of it so that I can use it in my companies new Portal environment. That would be really great! I love Google!

Check it out here.

--
KangarooBox - We make IT simple!

Re:Overview of Today's Headlines by costas · 2002-02-03 03:47 · Score: 3, Interesting

Hmmm... I can top that.
Re:Overview of Today's Headlines by mikeage · 2002-02-03 04:12 · Score: 2

Columbia has something similar.. my future brother-in-law was a grad student writing some code for it. It's from their Natural Language Project.

http://www.cs.columbia.edu/nlp/newsblaster

--
-- Is "Sig" copyrighted by www.sig.com?

You guys are missing the point... by duffbeer703 · 2002-02-03 03:29 · Score: 4, Insightful

DRAM requires little electricity and produces almost no heat.

Hard disks consume large amounts of electricity, and produce large amounts of heat, since they consist of pieces of metal spinning at 7200rpm.

Using DRAM upfront costs quite a bit more, but uses less electricity and requires fewer chillers, condensors, etc to keep cool.

--
Conformity is the jailer of freedom and enemy of growth. -JFK

Re:You guys are missing the point... by SilentChris · 2002-02-03 07:07 · Score: 2

What about costs to maintain redundancy (if a server goes down?)
Re:You guys are missing the point... by kesuki · 2002-02-03 07:45 · Score: 3, Informative

With over 35 DRAM chips on the american market what good does it do to check only a single type of memory module from a single maker?
However, since I don't want to spend the rest of the day finding out the lowest power DRAM module with the highest capacity, I will assume that the best case Senario is 4GB of ram using approximately the power of two HDs of any capacity after 4GB you would require either a custom DRAM NAS/HD or a second PC. However NAS Dram with multiple gigabit ethernet ports offer the most DRAM storage per watt of electricity. Still it is at least 4x as power hungry as an 8 HD 1TB Raid server. Assuming each DRAM chip in the NAS is 64 Megabytes. To reach one terrabyte we need 16 thousand Dram chips. Obviously if each chip even requires .1 watts to operate they're using 1600 watts of power. While the HD server may need a peak of 500+ watts even under load it still isn't using as much as when all 8 drives spin up so it's probably only using 400 watts total for the whole system under load.

While it's pretty clear that power isn't an area that google can save money using DRAM over HD, and while DRAM is solid state and if it doesn't fail the first 6 months it probably wont fail in the first 100 years, it is still going to become obsolete long before it fails, requiring replacement. I've also figured that at $4 a Dram chip the cost of 1TB is $64,000 Vs $5,000 for a total package 1TB HD server. Even if you replaced the drives every 6 months it would take 15 years before the cost of materials on HDs exceeded the cost of materials on DRAM. However, there is a cost savings. First of all if you're mirroring the drives that doubles the electrical and material cost of the HD storage. Second of all that 1 GB HD server is only going to have it's seek time saturated by only 100 megabit ethernet.
Unless the data is entirely sequential (not requiring seek time) and even in the case of sequential data a single gigabit ethernet is sufficient. That Dram 1TB has at worst 12 NS latency or .000000012 seconds per seek. That provides 83,333,333 seeks per second. The only thing he was wrong about is that DRAM isn't 200,000 times as faster as HD for data that requires seek it's on a magnatute of Millions of times more effcient. 200,000 times is probably based on real world performance differences. based on using DRAM vs HD in a "real world" setting and not just on paper. That means to replicate the Speed of DRAM with hard drives is a futile task.
Far more futile than trying to replicate the capacity of HDs with DRAM.

--
https://www.gnu.org/philosophy/free-sw.html
Re:You guys are missing the point... by duffbeer703 · 2002-02-04 01:44 · Score: 2

Irrelevant in the google model.

Google isn't using SAN arrays -- they are using thousands of disributed systems with one or two drives. In this model, they are saving memory by using DRAM, if not by direct energy savings then by savings in cooling equipment.

--
Conformity is the jailer of freedom and enemy of growth. -JFK
Re:You guys are missing the point... by duffbeer703 · 2002-02-04 01:49 · Score: 2

Traditional redundancy schemes(raid, etc) just aren't a factor in the Google system.

Google's applications replicate data across hundreds or thousands of servers in real-time. Most of their thousands of systems can be pulled off-line with no signifigant data loss or impact on the overall system.

Read some of the past Slashdot stories on google that describe how it works. I believe there was a story in June or July that showed how they achieve great performance & rendundancy on the cheap.

--
Conformity is the jailer of freedom and enemy of growth. -JFK

The key is in the MTBF by eldurbarn · 2002-02-03 03:42 · Score: 5, Informative

My last job was at one of the "other" search engines. We had a disk farm somewhat smaller than Google (about 140 Tb), mostly configured in RAID arrays, and we were swapping out dead bricks every few days.

Individually, the mean time betweeen failure for a brick isn't that bad, but when you get enough of them, it's a constant drain on the pocket and on person-hours.

--
-Eldurbarn

Re:The key is in the MTBF by Alsee · 2002-02-05 02:10 · Score: 2

What the heck is a cordless screwdriver?

Think cordless power drill. A motor can spin a screw in or out in under a second. Not only does it save time, but it prevents fatigue. You don't want to turn dozens or hundreds of screws by hand on a regular basis.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Re:Cost v Speed by Alomex · 2002-02-03 03:46 · Score: 2

Google does so cache images [google.com]. :)

Cute, but not quite correct. They cache post-stamp sized copies. If you want the full image you have to go to the original web site.

Granted, this does increase somwhat my original estimate of the amount of DRAM required.

Re:Cost v Speed by Space+cowboy · 2002-02-03 03:52 · Score: 5, Informative

Alomex wrote:

The web size is estimated around 5-10 Terabytes, and text size as percentage of the web is between 12-30% depending on whose paper you read.

I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

As I mentioned above, I look after a small but targetted search engine (http://www.financewise.com/) which looks at only financially-orientated sites. Take for example the European union site http://europa.eu.int. This is a fairly innocuous site, but if I do:

cd /opt/search/var/sites/26_europa.eu.int du -sk . 7731586 .

That's a 7.7Gb website, and that's just the text (in fact I only search for .htm, .asp, .php* and .html files). This particular website is growing at the rate of a couple of hundred Mb each month.

I just think that your estimate for the cache size is a long way short of the real figure...

Simon

--
Physicists get Hadrons!

Google is great... by Calle+Ballz · 2002-02-03 03:59 · Score: 2

...but they'll get a million times better as soon as they'll allow boolean searches. Man sometimes it's frustrating!!

Re:Google is great... by russianspy · 2002-02-03 04:54 · Score: 2, Insightful

They do. Read the guide. You can include parethesis, AND, and OR. I don't remember if they allow XOR and others. Oh... They allow negation as well.
Re:Google is great... by SpinyNorman · 2002-02-03 04:56 · Score: 3, Informative

Um.. they do.

AND is by default
OR is OR
NOT is -

I don't think parenthesis for grouping works though (they don't mention it), so you can't do more complex queries, but you can certainly do:

A AND (B OR C) AND !D

Which would be: A B OR C -D

DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · 2002-02-03 04:00 · Score: 3, Informative

Its not a fair comparrison to put 1GB worth of DRAM on one side of the scale, and 1GB worth of physical storage on the other. The hard disk will obviously come out to be the cheaper of the two. However, to a company like Google who undoubtedly uses RAID technology for storage, you're effectively not getting the same "bang for your buck" as you would with a JBOD array. In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.

Mind you, thats not to say that RAID is a bad technology..heh, hardly. Its just that you cant make a 1 to 1 comparrison from DRAM to physical without taking into account the storage methods employed by each.

Cheers

--
Bowie J. Poag

Re:DRAM probably is cheaper...Here's why. by foobar104 · 2002-02-03 06:06 · Score: 2

In order to have 1TB worth of DRAM on a scale next to 1TB of physical storage, you're going to have to amass like 2TB of storage on the plate in order to have just the 1TB worth of usable free space.

That isn't true at all. If you wanted to, you could mirror all of your data on two separate JBODs-- RAID level 1-- but that's not efficient. If you use RAID 3 or RAID 5, you'll never use more than 33% of your storage for parity data. As the size of your RAID set increases, the percent allocated for parity data goes down. In a 10-disk set, one disk is used for parity (in the case of RAID-3), which is only 10% of your total storage. (In the case of RAID-5, you'd still use only 10%, but you'd use 10% of each disk instead of one whole disk.)
Re:DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · 2002-02-03 07:28 · Score: 2

The example I gave was meant to demonstrate a point, not to be pedantic and overly technical. I'm well aware of the different pros and cons of RAID types. :) I do it for a living.

FYI, there is no such thing as a "parity disk" when it comes to RAID. I think you might be confusing parity with the notion of a quorum disk, which is something very different/ Parity is distributed thruought the array, and changes dynamically as data gets poured into the set. Having a "parity disk" would be contradictory to the whole point of RAID, as it represents a single point of failure for your storage. Not good.

Also, Google is a HA cluster. I can guarantee you they arent using JBODs to house their data, as you've inferred.

Cheers,

--
Bowie J. Poag
Re:DRAM probably is cheaper...Here's why. by Junta · 2002-02-03 09:32 · Score: 2

Actually, in RAID-4 (maybe 3, don't remember) there is a parity disk. The reason why parity info is distributed in RADI-5 is for performance consideration. Having a parity drive is in no more a single point of failre than distributed parity. So what if you lose the parity disk? Read and write operations wil continue to work (in fact, degraded mode in this circumstance would actually improve write performance, as you now have a RAID-0). Stick in your spare (or switch to a hot-spare), and the arrary reconstructs the parity disk just like any other.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:DRAM probably is cheaper...Here's why. by foobar104 · 2002-02-03 10:12 · Score: 2

I'm well aware of the different pros and cons of RAID types. :) I do it for a living.

Are you sure about that?

FYI, there is no such thing as a "parity disk" when it comes to RAID.

In a RAID-3 implementation, parity data is generated for each stripe unit and stored on one disk of the array. In RAID-5, the parity data is stored across all disks of the array, a little bit in every stripe unit. (RAID-4 implements parity on the block level instead of the stripe level; it doesn't really have any advantages, so it's almost never used.)

"Quorum disks" are, as you said, something entirely else. They're related to a particular type of implementation of failover clustering, widely considered to be inferior to true highly available systems.

Perhaps you're confusing RAID with high availability. That would explain your response, I think.

In short, you're either wrong, or your post was so unclear that you might as well be wrong.

Having a "parity disk" would be contradictory to the whole point of RAID, as it represents a single point of failure for your storage. Not good.

False. Consider a three-way parity set: disks one and two contain data, and three contains parity. If you lose disk 1, you can reconstruct it from disk 2 XOR'd (or whatever; the method depends on the parity generation scheme and is irrelevant) with the parity disk, and vice-versa. And if you lose the parity disk, you reconstruct it from disks 1 and 2 XOR'd (or whatever) together. There is no single point of failure there.

In fact, set rebuilds are significantly simpler in a RAID-3 implementation than they are in RAID-5.

I ask again: are you absolutely sure that you do this for a living?
Re:DRAM probably is cheaper...Here's why. by Bowie+J.+Poag · 2002-02-04 20:27 · Score: 2

Eeek, I said theres "No such thing as a parity disk in RAID" up there? Egads.. :)

For the record, yes, I meant HA. Not RAID in that context. I was attempting to point out that Google's choice of storage strategy would depend largely on the need to eliminate singular points of failure.

To continue the discussion, RAID 3 would be a rather poor choice of RAID type for an HA cluster. In RAID 3, parity needs to be handled sequentially whereas in RAID 5, read/write operations can happen simultaneously since parity isnt localized to any one particular drive. The margainal speed advantage RAID 3 offers over 5 is seldom enough for a typical admin to justify in the long run. Its only really seen in situations where overall latency takes a backseat to speedy access to huge files. Thats been my experience, at least.

And yes, i'm absolutely sure I do this for a living. :) I'm also absolutely sure I've had pneumonia & bronchitis for the past week, high fevers and all. Ended up in Urgent Care with a 104.6'F the night before. Hope that explains my storage faux-pas. :)

--
Bowie J. Poag
Re:DRAM probably is cheaper...Here's why. by foobar104 · 2002-02-05 01:56 · Score: 2

Okay, I knew there had to be some explanation. ;-)

We use RAID-3 exclusively, because our stuff requires deterministic read speeds. It's also a lot simpler to design software RAID-3 implementations because the parity generation and the rebuild algorithms are so much simpler.

We're going to start using RAID-5 for some of our new applications, though, because we just signed up to bundle HDS 9960 storage systems with our application. So that's going to be kinda exciting.

In my experience, obviously different from yours, RAID-3 and RAID-5 come up about 50/50. It just depends on what you do with it.

Re: Power Chord- by kuhneng · 2002-02-03 04:20 · Score: 2, Funny

The sound a Mac makes when you turn it on.

Re:Cost v Speed by Graymalkin · 2002-02-03 04:22 · Score: 2

Your single box for 2000$ doesn't take into consideration the fact Google needs to make their tons of information available to everyone at once. With a search engine like Google it is going to be rare information is just going to sit around and never be used. This means that by conventional database architecture logic you keep it cached in RAM. Hard drives are useful when you're cutting power to a computer, how often does Google reboot?

--
I'm a loner Dottie, a Rebel.

The Google feature I want by Hanzie · 2002-02-03 04:28 · Score: 4, Funny

See that "mature content filter"?

How about a "mature content ONLY search"?

--
********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.

Re:Cost v Speed by Alomex · 2002-02-03 04:31 · Score: 2

I really think people under-estimate the size of the web, and this only becomes apparent when you try to cache large sites. Sure the majority of websites are pretty small, but more often than not now, government and business websites are used for real data-access solutions.

Indeed, this has been a hot area of debate for the last 7 years or so, when the first paper with a substantially larger web than that indexed by search engines came out.

Usually search engines estimate the web size to be about 15-30% of that claimed by statistical measurements.

Innumeracy and price comparisons by Alomex · 2002-02-03 04:45 · Score: 2

One would have expected /. nerds could to better at price comparisons than what we have seen so far.

Quick, what is a better price a 1994 Ford Fiesta at $10,000 or a brand new Ferrari at $12,000?

Clearly the Ferrari is a better deal. To do a proper price comparison you have to look beyond the sticker price alone.

What is the performance you get? resale value? maintenance cost? operation costs?

If all you wanted to buy is megabytes of storage you would be better of buying backup tapes. They are hard to beat price wise.

But in all likelihood you need to store that data for some purpose, so depending on frequency of access, latency, total cost of operation (tapes are operator/robot mounted), alternative solutions with higher sticker price, might well end up being cheaper.

What Eric Schmidt claims is that if you have a ton of data and you are accessing it all the time DRAM is more cost effective than (a) a large mirrored RAID array server or (b) a zillion tapes being mounted by operators.

Re:Cost v Speed by zerocool^ · 2002-02-03 04:54 · Score: 2

Hrm...

So this is why SDRAM prices have been going up and not down lately...

Bastards...

~z

--
sig?

TOC, RAM vs. Steel Platter by eyepeepackets · 2002-02-03 04:59 · Score: 4, Informative

Recently I was fortunate enough to be able to play with (test) some RAMdisk products from a company called Platypus Technologies (do a Google search for platypus linux) on Solaris workstations and servers. And of course I just had to try them out on the Slackware boxes too.

These Platypus drives are PCI cards and have dual power source ability; they plug into the wall as a secondary supply and get power off the PCI bus as primary. Very cool to be able to shut down the machine to do whatever and still have your RAMdrive ready to go upon boot. Feature wise, they use expensive RAM and the manufacturer strongly suggests you not just grab any ole ECC to stick in the card but order from them (probably has to do with the grade of RAM they use in their cards.)

Performance was absolutely unreal: more than twice the speed of SCSI, in fact, practically as fast as the PCI bus in the machine will allow. I used the cards briefly while doing a a small database conversion project and was totally bummed when I had to send the RAMdrives home. *sniff*

If you have to do anything requiring lots of I/O (like database,) you _really_ do want one of these things or something like it.

Cost-wise they are a little spendy up front (even when compared to a SCSI setup with controller and drives) but if you are at all measuring time, then everything else looses the comparison; if you are measuring lost data on dead drives, the time required to make many redundant backups to avoid lost data on dead drives, the time required to shut down and swap out dead drives, etc. -- RAM wins! Just be sure to factor in the cost of quality UPS units because they truely are part of the cost (read necessary.)

Hook up a Qikdrive2 with one GB RAM, plug it into your UPS, make sure it gets backed up to the hard drive regularly (plenty of tools to do that) and I promise you that you will not want to be without one. If you have the resources, get one of the big ones (6 or 8 GB RAM, I forget.) Look on CDW, search Platypus for prices. The Platypus site has links to purchasing sites.

As always, be sure drivers/modules are available which will work for you. Ack, I'm rambling.

--
Everything in the Universe sucks: It's the law!

Re:TOC, RAM vs. Steel Platter by psych031337 · 2002-02-03 13:14 · Score: 2

This thing would really rock if you could use it to boot up your machine. Imagine an instant OS. Rebooting in less than ten seconds.

i'm off to change my pants.

--
+++ath0

Re:Cost v Speed by jovlinger · 2002-02-03 04:59 · Score: 3, Interesting

Just a thought:

when is it worthwhile to trade off cpu for storage? In your case, I suspect that the website has a degree of redundancy in its 7 gigs of data; there is likely much duplication. Both at the page level (duplicated ccs info), and at the snippet level (duplicated copyright disclaimers).

It is quite straight forward to discover this sharing (IIRC exactly how lzw compression works, but w/ a smaller window) and significantly cut down your storage costs. Of course, now you have a CPU hit, where storing new data becomes expensive, and just reading the data requires some pointer chasing.

The interesting issue is that the CPU hit isn't guaranteed to be a Bad Thing: your higher cache hit rate (indeed, your data may fit in ram entirely now) will possibly (likely?) result in significant speedups.

wrong... 10watts for 1GB reg. ECC SDRAM (PC133) by Lazy+Jones · 2002-02-03 05:16 · Score: 2

...

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

Re:Cost v Speed by Yokaze · 2002-02-03 05:16 · Score: 3, Informative

> each of which occupies how many bytes in index files?

According to "The Anatomy of Large-Scale Hypertextual Web Search Engine" by Segey Brind and Lawrence Page, the inverted index ("inverted barrels") was about 47.2Gb large (Total data without repository 55.2Gb, Repository 53.5Gb). It had about 24 Million web pages indexed. Assuming a linear increase this amounts to about 5Tb.
But, to quote from the paper:

With better encoding and compression of the document index, a high quality web search engine may fit onto a 7Gb drive of a new PC.

Which is surely slightly exaggerated, but shows that they considered that there is room for improvement. (E.g using varying length index instead of fixed width)

>I dont think Linux can do it
At least they think it can do it, since they are using Linux boxes, at least accoring to

The Technology Behind Google, by Jim Reese CEO.
More than 10,000 Linux boxes, that is.

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"

Index space? by SuperKendall · 2002-02-03 05:49 · Score: 2

That's a great calculation, but just figures the space needed for caching the raw data.

What about the indexes required to actually access that data in a timley manner? Once you factor in the extra stuff needed to actually make it a viable search engine, you could easily imagine a PB or more of storage was required.

As for the other poster going on about comrpessing the data - I doubt they'd want to compress the data when all they are concerned about is raw speed of processing requests!

.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

Re:Index space? by spiro_killglance · 2002-02-03 07:22 · Score: 3, Insightful

I don't know how google to it. But typical the
main over head is the inverse file, for every word on every page, you just need the number of the page it was in and the word position on that byte. So the Google needs around 8-12 bytes per (non stoplisted) word.

They must mean FIXED HEAD 'disks' v DRAM by Mongoose · 2002-02-03 05:56 · Score: 2

Fixed head hard drives have no seek time, since tracks have a many to many relationship to heads. That's also why you can't get them at compusa. ( expensive )

Why DRAM is cheaper by Animats · 2002-02-03 06:18 · Score: 2

The price advantage of storing the data in DRAM comes from needing fewer copies. A disk-based search engine like Inktomi has many duplicated clusters, each with a copy of all the data, to get the traffic capacity needed.

Also, Google's searchable data is considerably smaller than the total size of the pages searched, even excluding the images. Read their white papers. And I doubt that they store the cached pages and images in DRAM. Those don't get hit that often.

Re:Cost v Speed by kesuki · 2002-02-03 06:19 · Score: 2, Interesting

Google doesn't cache images google doesn't index or cache dynamic (scripted) content google caches PDFs as Plaintext.
However they are definitely on the scale of terrabytes. "Searched the web for a.
Results 1 - 10 of about 1,470,000,000. Search took 0.31 seconds." Assuming an average of ~25k cached per link 1.4 billion links would leave a cache of about 37,632,000,000,000 bytes, However The Cache doesn't necisarily need to be stored on RAMDISKs. He clearly states that it's 200,000 times more efficient for _seekable_ data. This means not the 'cached' data but rather the stuff that the search alagorythm looks at to show you appropriate hits. So the heart of the 'search' engine is using RAM exclusively, but 'cached' data would almost certainly still be stored on HDs, unless of course someone has built google a bunch of 120GB DRAM disks that use conventional HD interfaces (sorta like the Flash memory Drives, only on steroids when it comes to speed).
It could even be misleading Google could have meant flash memory HDs were cheaper but mistakenly refered to them as DRAM.

--
https://www.gnu.org/philosophy/free-sw.html

Re:Cost v Speed by BinxBolling · 2002-02-03 06:40 · Score: 2

RAM, as I figure it, is at least 65 times more expensive (that's not 65% more, it's 6500% more).

The data isn't just sitting there static, though: It's being searched. To switch to hard drives and maintain their current performance level, they would have to increase the parallelism of the search, by having many more copies of the index. One copy of the index on disk is not really equivalent to one copy of the index in DRAM, because the DRAM index can be searched many times in the period it takes to search the HD index once.

The quantity they're trying to minimize is not dollars per megabyte, but rather dollars per (megabytes searchable per second).

Re:Cost v Speed by Space+cowboy · 2002-02-03 06:41 · Score: 2

Sorry, I wasn't being clear - I forgot to point out that these files are already compressed (using gzip), but only on an individual file basis. The real site is significantly larger than this 7.7Gb, and I should have mentioned that.

Whereas I agree that we're getting close (or maybe have passed) the point where it would make sense to do something better, since I don't have much of a budget, and disk is cheap ....

ATB,
Simon.

--
Physicists get Hadrons!

Re:Cost v Speed by Sj0 · 2002-02-03 06:49 · Score: 2

Consider other things though. While the initial cost is high, electrical power for 1GB of RAM is lower than that of 1GB of hard drive, and since RAM is solid state, maintenence costs would be tiny. Imagine the costs of keeping a few hundred hard drives, each rattling away 24/7 from dying out!

--
It's been a long time.

I've read the comments, but no one answered by Catbeller · 2002-02-03 06:50 · Score: 2

a least completely.

I had the same question myself over the years. Especially recently, as memory prices dropped through the floor.

Linux has the option of loading itself into a ramdrive, and that's great. But why not Windows 98 or ME? Is it because it was technically hard, or was it instead tht the concept was too alien to the developers? (One ALWAYS uses disk! Don't bother me!)

RAM is faster -- always. I realize you that you can't live off of RAM alone, but at the very least the swap file shouldn't be on disk. I've spent too much time in the past ten years listening to hard drives slice meat as I waited for Windows to move pages off of and into RAM.

Well, if XP provides the option, fine. But I won't use XP. Don't like subscription OSes. Maybe the 2K version permits it. I'll try.

Wonder how much of computing is just bad habits?

Re:I've read the comments, but no one answered by um...+Lucas · 2002-02-03 17:10 · Score: 2

but at the very least the swap file shouldn't be on disk. I've spent too much time in the past ten years listening to hard drives slice meat as I waited for Windows to move pages off of and into RAM

Then add RAM. The entire reason for the swap file is because you don't have enough RAM. Thus, the OS is set up to use the hard drive as a slower back up... It'd be a waste to store your swap file on a RAM disk. Just add 1/2 a gigabyte or a full one and turn the thing off.

RAM Nodes by GrEp · 2002-02-03 06:54 · Score: 2

In many clusters today like KLAT2 they only use hard drives for the root nodes, and the other 98% of nodes use 2GIG of ram.

This saves you at least $150 per slave node by not buying a hard drive, thousands for having to deal with less hard drive failures, and acess times are orders of magnitutes better.

Lets do the math. 512MB of PC133 on pricewatch today was $67. For 2GIG of ram that comes out to $268 per node. For a terabyte(2modules*$67*1000GB)=$134,000.

That blows my mind. A small research lab can now own a terabyte of PC133 for under $150,000. Man, do I feel old.

--

bash-2.04$
bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME

Re:Cost v Speed by Score+Whore · 2002-02-03 06:56 · Score: 3, Insightful

The idea that all this is on DRAM is staggering. If the refresh stops (board failure, power problem) the data is just GONE?!

Google doesn't create content. They are a search engine. Nor are they in the business of archiving the net for posterity. If they lose data, it's out there to be recollected or if not, then there's no point in them saving it anyway.

Latency and bandwidth by Zeinfeld · 2002-02-03 07:01 · Score: 2

The key to the cost comparison is that RAM supports more queries per second than disk. Supporting the number of queries per second using disk would require a lot more duplicates of the data to support the query rate.

The cost differential between RAM and disk has been eroding for some time, particularly if you compare RAM with SCSI disk. While the price of IDE had dropped, SCSI is still premium priced for the business market, even though there is no reason why a SCSI controller should cost a cent more than IDE.

A 80Gb SCSI-160 drive costs $800, RAM costs $150 for a 512Mb DIMM. So Disk costs $10 per Gb compared to $300.

The problem with the raw comparison is that you still need a lot of RAM to service a large disk, caching etc. There is also a limit on the amount of disk data one CPU can effectively manage. From experience I can asure folk that that limit is certainly less than 80Gb if the lokups are frequent!

So when you add the cost of a CPU and box into the equation the RAM solution is gong to look much better. I doubt that a single CPU could effectively manage more than 4Gb of disk data, but 4Gb of RAM data is quite viable. And you probably need at least 1Gb of RAM to support the disk data in any case so the all RAM solution looks good.

For most database applications RAM wins hands down. On top of the cost of the disk you have to count on

The cost of an Oracle license ($100K +++)
The cost of a whiny Oracle DBA ($100K/pa)
The cost of an equally whiny SQL programer to interface your code to the crack pot SQL data model ($100K/pa)
The cost of licenses for GUI based schema design tools etc. etc. for the whiny SQL types
Trips to CostCo for Malox

The main problem for the RAM route is getting persistence on transactions. So you need some secondary storage in case of power failure or disaster. This could be tape, but ironically disk is cheaper to run these days than almost all tape systems. A 40Gb cartridge for a tape drive can easily cost $150, which is more than an IDE disk drive that outperforms on practically every level (probably even longevity).

The key is that you use your secondary storage to write out the transaction log, you don't attempt to maintain the data structure on disk like SQL databases do. For high reliability you use a complete duplicate of the system to provide your first level backup with disaster recovery at a remote site.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/

Re:Take a BUSINESS perspective (yes, it's painful. by Colz+Grigor · 2002-02-03 07:04 · Score: 3, Insightful

One other follow-up:

Google will also likely break their technology into three components:

spidering and indexing

searching

caching

Each of the financial analysts for the business groups responsible for each asepct of Google's technology may calculate the value of DRAM vs. HD differently. For searching, latency is extremely critical, but it's not so critical for caching, and there may be some physical problems with solely using DRAM for indexing.

That being said, I would expect Google to use HDs for spidering and indexing, DRAM for searching, and HDs for caching. Mr. Schmidt was probably only discussing technology on the most visable component of Google's technologies: searching.

::Colz Grigor

not even close by Preposterous+Coward · 2002-02-03 07:16 · Score: 2

Assuming 80GB drives each drawing 40 watts of power, and electricity rates at $0.20/kWh, you're looking at an annual power cost of less than $1 per gigabyte of spinning disk storage. That hardly accounts for the difference.

--

"Biped! Good cranial development. Evidently considerable human ancestry."

Re:not even close by RGRistroph · 2002-02-03 09:51 · Score: 2

Every watt of electricity you burn in those datacenters costs you double (at least), because you then have to pay the air conditioning to suck that heat out. At the cost there isn't in the electricity to run the AC, it's the initial and maintance cost of a bigger AC.

Otherwise you pay much much more than that to replace failed components.

It's unlikely that power accounts for all of google's choice, but it's total impact is mostly NOT the pure cost of the electricity.

Some of the posters are making arguments about needing to access disks fast, and that implying a RAID which is more expensive because you need more disks, and that is probably closest to the mark.

Silly people! by m.dillon · 2002-02-03 07:23 · Score: 3, Insightful

You guys crack me up some times.

I'll lay it out. Obviously Google is not storing the master copy of the full multi-terrabyte database in ram, but they are certainly storing as big a chunk in ram as they can, and the cost model ought to be easy for anyone to understand if you sit down and think about it.

Consider the cost difference between the following EQUAL amounts of hard disk storage:

* A 160GB IDE drive

* A 160GB SCSI drive

* Four 40GB drives in an external RAID system

* The cost of a small medium-performance RAID
system.

* The cost of a larger high-performance RAID
system scaleability to a terrabyte.

* The cost of an *EXTREMELY* high performance RAID
system scaleability to multiple terrabytes.

Now consider the cost of building, say, a 40 terrabyte data store (lets not worry about backups for this experiment). If you build it out of a bunch of huge SCSI drives connected to a bunch of PC's it can be fairly cheap. But if you build out of, say, high performance EMC arrays it could cost millions of dollars more to get the same theoretical performance.

So when you consider the cost of storage, you always have to consider the cost of the PERFORMANCE you want to get out of that storage. All the Google CEO is saying is that, Doh! It's a hellofalot cheaper to improve the performance aspects of the system by buying DRAM in a distributed-PC environment in order to be able to avoid having to purchase extremely-high performance (and extremely expensive) disk subsystems. The cost of purchasing the DRAM to make up for the lower-performing disk subsystem is actually LOWER then the cost of purchasing an equivalent higher-performance disk subsystem.

The same is true in the ISP world. When RAM was expensive we had to rely on big whopping HD systems to scale machines up. But when RAM became cheap it turned out that you could simply throw in a very high density drive with 1/4 the performance that four smaller drives would give you, and the operating system's RAM cache would take care of the problem. Suddenly we no longer needed to purchase big whopping disk arrays.

Think about it.

-Matt

Re:Five minute rule paper is interesting by billstewart · 2002-02-03 09:18 · Score: 2

I liked it, even though somebody apparently thought it was redundant. It doesn't directly apply to Google, but the principles of trading off speed and cost are still relevant even though the problem's a bit different. One thing I'd find interesting is knowing how much of Google's index data is replicated - one master copy (which might be backed up on disk) kept on N search engine boxes - vs. how much do queries get spread across multiple boxes? Does it make sense to cache the spidering on disk (probably, because rerunning spidering takes a long time, and because the article caches probably don't get hit as often, and don't need the same response speed as the indexing.)

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

But where do you stick the memory? by EvlG · 2002-02-03 09:47 · Score: 2

This sounds plausible, since you can use fewer machines. But the problem I see is, where do you find a machine that can address 80+ gigabytes of memory? Otherwise, you have to but just as many commodity boxes to hold the ram, which ruins the cost benefit.

Does anyone have any insight on what machines you would use to support this scheme? Does a SAN-type device for RAM exist? Some network-attached box that holds tens or hundreds of gigabytes of RAM?

What did the English language ever do to you? by Nindalf · 2002-02-03 11:17 · Score: 2

I mean, obviously you have some kind of grudge against it, to abuse it that way.

Take the RAM out of your computer and throw it at your workmate/housemate/mum. He or she will say 'Ow!', and it's not because he or she was hit by electrons!

This would, indeed, be the use of RAM as a mechanical object but this type of use is not characteristic. You appear to be claiming with this example that any solid object (and possibly any matter) is a "mechanical component," which is wrong and would be harmful to meaningful communication if accepted.

Any solid object's atoms move in relation to each other. This does not mean it can be said to have "moving parts" (this useful phrase would be rendered meaningless, otherwise), or make it a "mechanical device" (ditto).

Every electrical device is utterly reliant on its physical structure to function properly, and will cease to function properly if its structure is altered beyond certain limits. A broken connection is not a mechanical failure.

Sure, the clip that holds it in place is mechanical, and can suffer mechanical failure, but that is not part of the RAM. To note Telstra's odd problem as evidence of RAM being subject to mechanical failure is like talking about a wind-up alarm clock being struck by lightning as evidence of such clocks being subject to electrical failure (this would, of course, actually be an electrical event causing a mechanical failure).

Re:Cost v Speed by Space+cowboy · 2002-02-03 11:18 · Score: 2

It's a risk, but the problem is that other sites will intermingle .html and .php/.asp depending on whether there is any customisation or even just for headers and footers.

In this case, almost all the documents are in fact dumps of pdf files also on the original site. I chose it because I knew it was big :-)

Besides, for a search engine, getting the catalogue can be a useful thing - in the sort of targetted search engine that I'm maintaining, anyway. A lot of the searches are for particular mathematical models (mainly excel spreadsheets at exorbitant cost). These tend to be catalogued just like any other online shop...

ATB,
Simon

--
Physicists get Hadrons!

Re:not just web pages by Metrol · 2002-02-03 12:37 · Score: 2

the catalog part is still in beta, but it's really amazing.

I hadn't really looked at that part of Google until your post. Based on a couple of searches I did, didn't seem all that amazing to me. More like white knuckle frightening!!

This must be that level of technology that is too easily taken for magic. There are just too many perfectly rational reasons why this "shouldn't" work at all!

--
The line must be drawn here. This far. No further.

Blue screen of second death by leonbrooks · 2002-02-03 12:41 · Score: 2

Why Windows does not run off a ramdrive

It does. But it doesn't help much and measn you have to reload the whole RAMdrive (generally over a LAN) when the box dies. Admittedly, it is a more efficient use of RAM than just handing it to Windows, since Windows (particularly the 9X stream) is a hopelessly inefficient user of RAM.

AFAIK Linux and Open BSD cannot do this either.

You must really have spent a lot of time and looked hard before saying that... )-:

``And death and hell were cast into the lake of RAM. Diskless Windows is the second death.'' -- Revelation 20:14, Geek Modified Version

--
Got time? Spend some of it coding or testing

Re:Hard disk is an obsolete technology by Dyolf+Knip · 2002-02-03 17:16 · Score: 4, Interesting

So hard drives are about 10 years ahead of RAM in terms of $/MB? Sounds about right. 1GB hard drives were on the high end of normal users at the time, as is 1GB of RAM today (though I seem to recall having more than 10MB RAM at the time). Assuming the same increases in the next decade... 100GB RAM and 10TB drives. I like.

Solid state everyting would be great (wasn't there an article on solid state cooling fans a while back?), but it may take a while for RAM drives to bridge that big a gap, especially given the volatility problem. One big step is the drastic increase in RAM speeds, compared to hard drives which have increased only slightly in that regard.

As someone else said, it is only a matter of time.

--
Dyolf Knip

Caching isn't that great by one-egg · 2002-02-03 19:25 · Score: 2

The standard response to suggestions of storing data in RAM is, "That's dumb; just let the cache do the work." But it turns out that caching doesn't do nearly as well. The overheads involved (such as the cost of finding the block in the cache) make caching significantly worse than using RAM more wisely.

You can learn a bit more about these results from our short paper (PDF) just presented at FAST, or wait for the June Usenix conference to see a longer paper.

Re:Cost v Speed by markmoss · 2002-02-04 07:27 · Score: 2

They're not going to lose refresh because of power failure. No matter what the storage technology is, you don't leave a server farm like Google's at the mercy of the local power grid, you have some sort of generators for backup.

They _will_ lose bits of data. DRAM chips fail. Motherboards fail (taking out perhaps 2G at a time). Cosmic rays flip individual bits. It's much less lost at a time than HD fails, but probably the flipped bits occur far more often. But Google never guaranteed 100% accuracy...

Re:Cost v Speed by PhotoGuy · 2002-02-05 13:54 · Score: 2

Actually, I thought I heard that Google uses single IDE drives, in a whack of distributed generic PC's. No SCSI involved.

And as several other posters commented, *YES*, I AGREE, if speed vs. cost is a factor, then the 65x caculation is less relevant. But it'd take a heck of a lot of requirement for speed to overcome a 65x cost savings (put 30x more machines in place at half the cost, and get the performance you need, with the right architecture).

And one of the most popular (my favorite) search engines might just mandate speed to the point that a 65x cost penality is *well* worth it.

Man, I wish people would *read* the posts in detail before posting. (Not that *I've* ever been guilty of that :-

-me

--
Love many, trust a few, do harm to none.

Re:Cost v Speed by Shanep · 2002-02-06 02:18 · Score: 2

The idea that all this is on DRAM is staggering.

I remember when AltaVista (back in 1996) was boasting that they had 1GB of RAM for their search engine. :)

But RAM was so cheap up until recently and Google uses so many servers, that I think it probably would be cheaper for them to just work out of RAM. No disk or LAN medium can match RAM for access time, transfer rate and life span and these things are probably most important to Google.

Trying to have extremely fast disk sub systems in each server in the Google farms would probably incur very high expense space, yeilding much more space than required and much slower space of that which is actually used.

I don't think this comes down to the typical MB/$ comparison between disks and RAM because Google might only have a gig or so in each server, with lots of servers.

If you're comparing a gig of really fast memory between RAM and disk, it is easy to see which is cheaper. A gig of RAM would have cost me a few months ago in Sydney ~$300.

Whereas a gig of the fastest disk I could possibly get might cost me tens of thousands for a load of 15k RPM SCSI disks and a few 64bit PCI hardware RAID-0 cards so that I could only get a meazly 528MB/s transfer rate out of, probably half that of the RAM speed and access times for the RAM would be astonishingly faster than any disk, resulting also in many hundreds of gigs that will probably mostly not be required. Far too expensive, far too ineffective. Google needs fast access times and transfer rates the most, but the fastest of SCSI systems will have their transfer rates killed by zillions of very poor access times. Random access does'nt hurt transfer rates with RAM, the way it does for disk.

In the end, these machines would probably just end up being configured to each serve what they could cache with RAM so as to keep up with the demand, so why not just boot all these machines off a little flash disk and then just work the engine out of RAM?

This does'nt just come down to needing RAM or disks that can transfer as fast as the network interfaces, since this is not a simple cache or file server. These servers need to search through their whole index a fast as possible, and doing this in RAM at super high speeds is going to be much more economical in RAM than disk. I doubt Google could even be feasible working out of disks.

--
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?

Slashdot Mirror

Google Prefers DRAM to Hard Disks

144 of 354 comments (clear)