Brian Aker On the Future of Databases

← Back to Stories (view on slashdot.org)

Brian Aker On the Future of Databases

Posted by kdawson on Tuesday June 3, 2008 @10:52AM from the everything-you-know-changed dept.

blackbearnh recommends an interview with MySQL Director of Technology Brian Aker that O'Reilly Media is running. Aker talks about the merger of MySQL with Sun, the challenges of designing databases for a SOA world, and what the next decade will bring as far as changes to traditional database architecture. Audio is also available. From the interview: "I think there's two things right now that are pushing the changes... The first thing that's going to push the basic old OLCP transactional database world, which... really hasn't [changed] in some time now — is really a change in the number of cores and the move to solid state disks because a lot of the... concept around database is the idea that you don't have access to enough memory. Your disk is slow, can't do random reads very well, and you maybe have one, maybe eight processors but... you look at some of the upper-end hardware and the mini-core stuff,... and you're almost looking at kind of an array of processing that you're doing; you've got access to so many processors. And well the whole story of trying to optimize... around the problem of random I/O being expensive, well that's not that big of a deal when you actually have solid state disks. So that's one whole area I think that will... cause a rethinking in... the standard Jim Gray relational database design."

6 of 175 comments (clear)

Min score:

Reason:

Sort:

Well by CrispBH · 2008-06-03 11:03 · Score: 5, Funny

I couldn't... agree... more... I'd say that some... very valid... points... have been... raised.
Dear Slashot by CopaceticOpus · 2008-06-03 11:07 · Score: 5, Funny

Can we please have another loud, circular debate over which database is best? It's the only way your favorite database will ever win.

Thank you.
1. Re:Dear Slashot by dave87656 · 2008-06-03 16:48 · Score: 5, Interesting
  
  Okay, I'll bite too ...
  
  We've been running MySQL using MyISAM since 2002. It's delivered acceptable performance until recently as we've expanded our application and the data volumes have increased. Now, we have to reorganize it on a frequent basis (we just backup and restore).
  
  But, we really need to move to a transactional model so I've done some benchmarking between InnoDB and Postgresql. In almost all cases, Postgresql was significantly faster. Our application is very transactional with alot of writes.
  
  And from what I've read, Postgresql scales well to multiprocessors and multiple cores where as MySQL does not. I know Falcon is coming but it was still very Alpha at the time I compared - I couldn't get it to run long enough to perform the tests.
  
  Has anyone else compared Postgres to MySQL/Innodb?
Leaky abstractions by yoris · 2008-06-03 11:10 · Score: 5, Interesting

Gotta love that link between the hardware limitations and the software concepts that may seem fancy but are essentially only built to get around them. I believe someone once called it "the law of leaky abstractions" - would be interesting to see what the new limitations would be if you start combining solid-state storage with pervasive multiprocessing, i.e. what can you do with a multi-processor multi-sdd server that you can not do with a single-processor single-hard drive server?

I think TFA is pretty right on the money that parallellization and massive use of SSD could cause some pretty fundamental changes in how we approach database optimization - if I were to imagine that rack that I'm staring at being filled with SSD drives and processors instead of with nothing but hard drives... locality of data takes on a whole new meaning if you don't require data to be on the same sector of the HD, but rather want certain sets of data to be stored on storage chips located around the same processor chips to avoid having to overload your busses.

Then again, I haven't been in this game for so long, so maybe I'm overestimating the impact. Oldtimer opinion would be very welcome.
This IS news! by Bluesman · 2008-06-03 11:16 · Score: 5, Funny

MySQL has people who are responsible for *designing* it? I'm shocked, Shocked.

--
If moderation could change anything, it would be illegal.
Locality is the key by Dave500 · 2008-06-03 13:55 · Score: 5, Interesting

In my mind as a database engineer for a wall street bank, the biggest change in the near term that we forsee is data locality.

Given the amount of computing power on hand today, it may surprise many how difficult it is to engineer a system capable of executing more than a few thousand transactions per second per thread.

Why? Latency. Consider your average SOA application which reaches out to 4-5 remote services or dataserver calls to execute its task. Each network/rpc/soap/whatever call has a latency cost of anything between one and at worst several hundred milliseconds. Lets say for example that the total latency for all the calls necessary is 10 milliseconds. 1000/10=100 transactions per thread per second. Oh dear.

The amount of memory an "average" server ships with today is in the 32-64GB range. Next year it will be in the 64-128GB range. The average size of an OLTP database is 60-80GB.

So, the amount of memory available to the application tier will very soon be greater than the size of the database, warehouses excluded. Moore's law is quickly going to give the application tier far more memory than it needs to solve the average business state, exceptions noted.

The final fact in the puzzle is that for transaction processing, read operations outnumber write operations by roughly 20 to 1. (This will of course vary on the system, but that *is* the average.)

This situation is strongly in favor in migrating read only data caches back into the application tier, and only paying for the network hop when writes are done in the interests of safety. (There is a lot of research into how writes can be done safely asynchronously at the moment, but its not ready yet IMHO.)

Challenges exist in terms of efficient data access and manipulation when caches are large, performant garbage collection and upset recovery - but they are all solvable with care.

Its my opinion that in the near future large data caches in the application tier will become the norm. What has to be worked out is the most effective way of accessing, manipulating and administering that data tier and dealing with all the inevitable caveats of asynchronous data flow.

Some (not complete) examples of implementing this:

Relational Caches (there are many more):
http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_for_java.html
http://www.alachisoft.com/ncache/index.html

Object Caches:
http://www.ogf.org/OGF21/materials/970/GigaSpaces_DataGrid_OGF_Oct07.ppt
http://jakarta.apache.org/jcs/