MySQL Creator Contemplates RAM-only Databases
Aavidwriter writes "Peter Wayner asks Michael 'Monty' Widenius of MySQL, 'When will RAM prices make disk drives obsolete for database developers?' From Monty's answers, it sounds like hard drives may be nothing but backups before long." From experience, I'd wager that RAM failure rates are less than hard drive failure rates, so it might also mean more stability from that perspective.
I remember reading somewhere that, due to things like thermal radiation and cosmic rays, every so often a bit in RAM is changed by 'accident'... isn't the ECC RAM (which, IIRC, negates the effects of such interaction) horrendously expensive though, more so than the 'normal' SDRAM variants we have these days?
With our Exchange server, we use a Platypus Qik Drive to send our retrieval times through the basement. We put the database on Qik Drives (but mirror it hourly on to HDDs)...it makes our effective Exchange bandwidth limited to the gigabit ethernet port on the server.
Q: "Why do sound techs say 'check 1, 2'?"
A: "Cause if they could count any higher they'd be lighting techs."
...goes to whoever is crazy enough to put their entire database in RAM.
Now if the RAM was non-volatile and was static with the power off that would rock, but volitile RAM - are you crazy?!!
"RAM failure rates are less than hard drive failure rates, so it might also mean more stability from that perspective" Well that is because they havnt been subjected to that sort of load as yet. RAM could pose its set of unique problems once implemented as databases.
Siggy Say, Siggy Do
When our site was slashdotted last year, we were able to cope with the load after putting our database into RAM. It's probably not the best solution, since the RAM would get deleted if the system crashes (or the power goes out, etc.), but it's a good temporary measure.
OLPC Australia
RAM-resident Database? Yes, that would be Google -- a massive, massive cluster of x86 boxen with a couple gigs of RAM apiece. Each gets a portion of the hashspace, leading to near-O(1) searchability. I'm pretty sure all the big search engines work this way, at this point -- the DB is checkpointed into RAM, but is never actually run from it.
:-)
Recent discussions about disks vs. CPU's have ignored the massive decreases in the cost of RAM. For a very long time, the secret bottleneck in PC's (in that it wasn't advertised heavily) was RAM. That's starting to disappear -- there's a gig in my laptop, and there's no discernable improvement in all but the most intense applications if I were to go beyond that.
Virtual Memory is already on the chopping block; any time it's imaginable that a system might need another gig of storage, it's probably worth going to the store and spending the hundred dollars.
But what if more RAM is indeed needed? One of the most interesting developments in this department has involved RDMA: Remote DMA over Ethernet. Effectively, with RAM several orders of magnitude faster than disk, and with Ethernet achieving disk-interface speeds of 120MB/s, we can either a) use other machines as our "VM" failover, or more interestingly, b) Directly treat remote RAM as a local resource -- a whole new class of zero copy networking. This Is Cool, though there are security issues as internal system architectures get exposed to the rough and tumble world outside the box. It'll be interesting to see how they're addressed (firewalls don't count).
What next, for the RAM itself? I don't think there's much that value in further doublings...either of capacity, or soon, of speed. What I'm convinced we're going to start seeing is some capacity for distributed computation in the RAM logic itself -- load in a couple hundred meg in one bank, a couple hundred meg in another, and XOR them together _in RAM_. It'd just be another type of read -- a "computational read". Some work's been done on this, though apparently there's massive issues integrating logic into what's some very dumb, very dense circuitry. But the logic's already done to some degree; ECC verifiers need to include adders for parity checking.
My guess...we'll probably see it in a 3D Accelerator first.
*yawns* Anyway, just some thoughts to spur discussion. I go sleep now
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
http://www.imperialtech.com/success_ebay.htm
The basic idea: use solid state ram drives (with separate power supply) for your busy tablespaces and your redo logs.
This leverages 'cheap ram' technology with existing (and proven and scalable) db architecture.
For ebay, for example, they might store 'active items' in 'ram-drive-backed' tablespace and 'old items' in the 'hard-drive-backed tablespace'.
These solid-state drives are expensive, but additional Oracle licenses (or moving from 'standard' to 'enterprise' or to 'clustered') are very very expensive.
bill m
I told my boss that it was a very Bad Thing as the stores could lose data so easily. He told me several things:
- Running entirely in RAM, the system was very very fast. When the system could smoke a more expensive IBM PC-XT running a dBase app, he could sell more systems
- Every system would have a UPS as he refused to sell them without
- He signed my paycheck, not the other way around
As best I can figure, Darwin was more interested in awarding JATO assisted drag racers back then because we got lucky and actually had more trouble with the systems using hard drives. That was back during the heyday of the small mom-and-pop video stores. The last of those RAM disk based systems that I knew of converted to a "real" system in 1993. I believe they were assimilated by one of the national chains soon after.You either believe in rational thought or you don't
Hell, I'd just be happy if they would normalize their tables. I've seen joins across three tables which all hold essentially the same data, and working on the entire result set after downloading it to the client and then uploading the whole thing back when one thing has changed.
And they wonder why I'm crazy.
You think that I'm crazy, you should see this guy!
A quick run of CDW.COM should answer your question. The cheapest 1GB SDRAM DIMM runs a little over $220, making a terabyte a little under a quarter million dollars. On the cheap end of SCSI hard drives, about 20GB can be had for about $120, making a terabyte RAID about $6,000.
Since hard drive space is keeping pace in increasing size and decreasing price, while data storage requirements are shooting further through the stratosphere everytime a manager or executive utters the words "data warehouse," the most economical fail-safe solution will always win and a 27:1 cost ratio isn't going to convince anyone to switch. Cheap IDE & SCSI arrays will continue to dominate OLTP applications for the reasonably forseeable future until such time as that ratio is cut to 1:1, which will happen about the time probe storage atomic memory hits CompUSA.
Normalize until it hurts, denormalize until it works. Until you hit the really huge databases, you won't appreciate that saying.
I'm confused. I actually haven't used MySQL much, and someone else can clarify its current ACID compliance. My application involves multiuser financial transactions. When making my DB selection a couple of years ago, at that time it was claimed that MySQL had some ACID deficiencies that made me nervous. I settled on PostgreSQL, which I'm very happy with.
But there's a lot more to ACID than just keeping RAM and disk in sync, and I don't see how RAM would make ACID that much easier, and certainly not "almost trivial". You still have all the transactional semaphores, record locking, potential deadlocks, rollbacks, etc. to worry about. In fact I don't see why you wouldn't just have the RAM pretend to be a disk and be done with it, since the disk version already has stable software. Then, if it is important to increase performance further, RAM-specific code optimization could be done over time, but slowly and carefully.
I'm sorry - I really don't want to get into a religious war here, but the interview didn't do much to bolster my confidence in MySQL for mission-critical financial stuff. Educate me.
You really cant compare things like that for databases. AISITA (as is stated in the article) the big bottlenecks for both are similar in nature but orders of magnitude in scope. I currently work with a medical database where everything has to be logged. Disk access is a big factor for us, so we use fibre channel scsi (specifically Seagate 73.4GB 10000RPM) where the cost is more like 700 dollars for 70gb) (basically $10 per GB not the $1 per GB you are showing) Also there is the issue of supporting hardware but we will ignore that for the time being.
time for some napkin math:
1 512MB ecc reg pc2100 dim -> $ 78 or $156GB
1 70GB Fibre Channel Drive -> $700 or $ 10GB
Now lets factor in raid (for access speed and redundancy)
we typically put 8 drives in a bundle which tends to give us 36% of the total drive capacity (mirrored raid 5 aka raid 6 remember teh ram is ecc reg so this factoring is already in place for it)
8 * $700 -> $5600 for
36% * 8 * 70 = 200GB
This give me approximately 1GB for $28
now thats a factor of 5.6 (call it 6) in price from ram only. AND i still get a prolly 4 fold increase in throughput. Not bad at all in my book.
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
If you've got enough ram for your database to fit in, why not mmap it and do a simple search? It tends to take up much less memory than a database and you can search a whole lot of records in the time it takes to do a context switch (which is what you get when you use a socket to talk to the database program).
Back in the early 90's IBM added a machine instruction to their mainframes called DIV. It treated data in a file system as if it where in virtual mememory - ie addressRecord[12345] appeared to the program as an in memory array, but was backed by disk storage - the same format that was used for paging virtual memory - brilliant. It's a shame it never caught on - it would make advances like this transparent in implementation. Well I guess you can't really say it never caught on - it was a big reason IBMs mainframe databases outperformed everyone else for so long.
Is there a similiar kind of instruction on Intel? It's probably too late though - indexed arrays have become less useful since associative array patterns have become better defined. A hardware implementation (RAM) of JDO would be interesting.
slashdot troll = you make a compelling argument I do not like the implications of.
There are components of ACIDity that would be implmented very differently for RAM-persistent databases than for disk-persistent ones. Maintaining ACIDity on disk-persistent databases requires complicated algorithms to mitigate the disatrous disk seek times. These complicated algorithms would be rendered unnecessary if disks were no longer used.
For example, disks have incredibly slow seek times and much better bandwidth; therefore it's far cheaper to write things to disk in big chunks. The purpose of write-ahead logging (or "redo logging") is to mitigate the performance impact of slow seek times by blasting all the transactions to disk at once, in the redo log, thereby avoiding the slow seeks that would be required by putting each transaction in its proper place. Putting the transaction data in its proper place is deferred until after the load has died down somewhat. This could be seen as exchanging seek times for bandwidth.
This redo log mechanism would be unnecessary for ram-persistent databases. It's a significant source of complexity that would be obviated by the removal of disks. And that's just one example of complexity required to get adequate performance from disk, a medium that has disastrously slow seek times.
the interview didn't do much to bolster my confidence in MySQL for mission-critical financial stuff. Educate me.
Cancel that financial adjective. You're thinking too narrowly.
Infuriate left and right
I believe it surfaced a while back on /. - can't find any links at the moment, but AFAIK the entire Google index is stored in RAM.
grisha.org
I fondly remember learning to program in C with a recoverable RAM disk on the Amiga. I was able to copy most of my source, headers, libs and even the compiler into VD0: and do everything from RAM. Reset the computer due to a crash and most of the time the RAM disk was intact. I'm sure that quick cycle helped me to learn C faster.
A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
Just from curiosity, how much data are we talking about for a large corporation, say SW Air or BofA?
Impossible to put a figure on the total amount of data that exists within an organization, but a typical SAN in a major financial institution has terabytes online. UBS Warburg has 2 Tb in just its general ledger database. Acxion has 25 Tb in its data warehouse, which will mainly be used for queries, whereas the GL database will be more transaction heavy. SouthWest is an Oracle customer, but it doesn't say here how much data they have.
We need archival storage devices that won't lose data unless physically destroyed. We don't have them. Tapes don't hold enough data any more. Disk drives don't have enough shelf life.
DVD-sized optical media in caddies for protection, maybe.
(It's annoying that CDs and DVDs went caddyless. Early CDs drives use caddies to protect the CDs, but for some idiotic reason, the caddies cost about $12 each. We should have had CDs and DVDs in caddies, with the caddy also being the storage box and the retail packaging for prerecorded media. There's no reason caddies have to be expensive. 3.5" floppies, after all, are in caddies.)
Valid points, of course :) But you have to admit that for simple home pages (and not corporate databases) MySQL is simple, to-the-point, and easy to use.
And free, although many other ones are free as well. (I wouldn't want to run Oracle@Home ... of course, postgreSQL is also free, and I hear it's more mainstream as far as true database functionality.)
"Careful" people can enforce their own data integrity - obviously, it gets harder as the size (number and complexity of tables, I mean) of the database expands.
Can you tell I use it myself? :) You sound like you have experience with other database systems, how difficult do you think it would be to port an existing MySQL+PHP system to PostGreSQL or something similar?
Lucent had an in-memory database that they used a couple of years ago--it was called DataBlitz.
They offered me a job (I was also talking with the folks at TimesTen).
Main memory databases are great for a couple of tasks
1) Native hardware
2) Serving up static data accessed through a relational model
3) Being a front end for a (standard) relational databse
4) Being a nice programmatic way of defining data structures in your own application. (Wouldn't it be nice to be able to have tables and joins as 'regular' data structures in a C++ program?)
The internal technology is quite different--the bottleneck usually is CPU rather than disk related, even though most of the CPU instructions that deal with minimizing disk usage have been removed.
The field is really exciting, and I can't wait until the prices of the products come down enough so that I can just go otu and buy it.
TimesTen is incredibly expensive for what you get when you consider their SQL support is very weak in many areas (for example, nested queries).
Also, it's notable that TimesTen no longer advertises themselves as an in-memory database. They now claim to be a "Real Time Event Processing" system. Perhaps this is a hint that RAM databases are not as appealing in a marketplace flooded with cheap disk drives.