The Future of Databases
gManZboy writes "Ever wonder where database technology is going? This is something that Turing award winner Jim Gray from Microsoft has given a lot of thought to. He recently published an article in which he looks at the many forces pushing database technologies forward, and what those new technologies will look like. Gray writes, 'the greatest of these [research challenges] will have to do with the unification of approximate and exact reasoning. Most of us come from the exact-reasoning world -- but most of our clients are now asking questions that require approximate or probabilistic answers.'"
Ah yes. Harken back to the earlier days, when databases were just files on a file system, and did not distribute the resourses at all.
Certainly that's not going to lead to more crashes.
Certainly it's a better idea than, for example, distributing the databases and using load-balancing and regularly scheduled back-ups to ameliorate the loss of the least realiable portions of a databases design - the harddrives.
When you've only got a hammer, everything seems like a nail...what does Hans Reiser do? He could be right. Microsoft is jumping on the filesystem-database wagon with their new filesystem, and we all know that if anyone knows and cares about reliability it's Microsoft.
Mod me down and I will become more powerful than you can possibly imagine!
most of our clients are now asking questions that require approximate or probabilistic answers
Bioinformatics databases are a good example of this. DNA and protein sequence databases are often searched by approximate string-matching algorithms based on "dynamic programming" to hidden Markov models and other stochastic grammars.
Historically, drug target-hunters in Big Pharma created a market for accelerated hardware to facilitate dynamic programming searches, some of which (e.g. Paracel's Fast Data Finder chip) was originally marketed to government agencies who, um, shared an interest in approximate string-matching ;)
When all the glowy boxes said sell the stock market crashed.
The rise of Sarbanes-Oxley highlights a key insecurity in the accountability of enterprise systems. Although the high-level applications can do a good job of tracking who did what to the financial data, the core DB may be open to tampering. If a DB admin with the right password can manually diddle a field in a database, they can change the financials of the company.
In contrast, a secure bitemporal DB would record not only the date of the what the data refers to (e.g., the purchase order was entered on March 3rd, 2004) but also the date(s) of any modifications of the data (the quantity and total was changed on December 31, 2004, Uh-Oh!).
This is more than just securing the DB with a hierarchy of privileges, it means that no one can overwrite the old data or change any data without creating an audit trail. This, of course, also means changes in the DB, OS or file system to make critical data only accessible through a secure DB layer that tracks changes (e.g., no accessible plain-text DB data structures). These same concepts could be used (probably are, for all I know) for OSS version control to track who did what and when to the code.
Two wrongs don't make a right, but three lefts do.
On a journalled and properly transacted database, it could never cause corruption.
I've managed databases that are 150 GB large with hundreds of thousands of records in SQL Server 2000 and I have only ever seen one corruption. That was caused by a schema change on a large table during which the power to the machine was cut. When the machine came back online the database was marked as corrupted and set aside for about an hour while it was being recovered. The incomplete changes from the transaction log were rolled back and the database was back online as if nothing happened.
Slashdotters- please remember that comments are better when the articles are actually read... that being so...
... it's NOT going to happen...
Here are my problems with what Gray et al said.
Under the mounting onslaught, our traditional relational database constructs--always cumbersome at best--are now clearly at risk of collapsing altogether.
In fact, rarely do you find a DBMS anymore that doesn't make provisions for online analytic processing. Decision trees, Bayes nets, clustering, and time-series analysis have also become part of the standard package, with allowances for additional algorithms yet to come.
He seems to be confusing two issues- one is finding what (data) to find and the other is finding that data. A database enables the second...a program, a human or both in combination with the technologies he mentions is responsible for the first. We shoudl never confuse these two. A database is a way to identify unique things by their properties, literally, a way to distinguish "things" from each other. If it's done that in a manner that provides for ACID and retrieval then our work is done. Finding WHAT to find is another completely different concern.
The relational model is not going anywhere- and that's what every database is - an implementation of the relational model; some better than others.
Gray is a great researcher, and maybe he was talking about the need to make current databases PRODUCTS "better".. but it doesn' read like that... it reads like a battle cry for us to move beyond the relational model.
The parent makes a good point, and it's pretty easy to see why if one holds off the usual anti-Reiser reactions and thinks it through a bit.
Databases require a mechanism for atomicity to create their transactions, and because no common operating system has ever provided such, they need to implement it themselves at application level. It's like the bad old days before PCs provided networking, and you had to run up your own networking stack if your application needed comms.
Well reiserfs has the goal of providing atomic transactions at filestore level, so in principle it will become possible to leave a good chunk of the very hairy rollback processing of conventional RDBMSs to the operating system.
It won't remove the need for proper RDBMSs for power database applications, nor will it in any way obviate the need for database distribution, but it should make professional databases both simpler and more robust. And it should also allow mini-database applications to be coded directly around the filestore with better transactional properties than the traditional flat-file designs.
When did adding overhead become the mark of skill?
Never. But using databases to encapsulate business logic (PL/SQL, for example) has been a mark of good developers and engineers for years. Apparently, you've missed that mark...
No, they're not. WinFS is *not* a filesystem, it's a DB layer that sits on top of the filesystem.
And when you consider NTFS *on its own* (like BFS) has the capabilities to do most of what WinFS is supposed to achieve, WinFS just looks sillier and sillier...
> The "next great advancement" in databases will be when I can setup 2 or more linux servers and have
>them act as a single database server. Our database server is the most expensive item in our datacenter
>because it's an N-way IBM server.
lol, IBM has supported *exactly* what you are talking about for at least five years.
That is, you can spread your db2 database across 10,100, or 1000+ linux commodity boxes (ideally blades). Or you can use windows, or aix, or solaris, or hp-ux, etc. Of course, those individual boxes can be SMPs in their own right - so a thousand 8-way aix boxes is certainly possible, if not cheap.
Oracle is now in this game as well - oracle 10g can certainly support 32, and maybe 64 individual linux boxes in a cluster. The techniques are different between the two - oracle might be better at transactional systems. db2 is definitely better at data warehousing, data mining, etc.
Of course, there are still benefits to a big smp: a single P570 16-way will cost you $250k. But each of those 16 cpus is multi-code (and far faster than intel or amd), and with its micro-partitioning - it can run at least 150 linux or aix lpars (logical partitions). These lpars can grow or shink as they need - so you aren't always over-buying for size, buying new hardly-used hardware, or having to colocate apps on a busy server - when a different os would be preferable. Not to say everyone should go this way - but there are definite benefits.
2 servers acting as a single database server has been available for many years...e.g., Oracle 9i RAC, Oracle 10g, DB/2's something or other, etc.
Advice: on VPS providers
> So, instead of investing on a DBMS why not buy massive quantities of ECC memory and keep all
> instances of your data in-memory for near instant access?
because a *well-tuned* relational database with a 1:4 ratio of memory to disk is almost as fast as an in-memory database - due to efficient caching
because some queries require an enormous amount of temp space. supporting them can easily double your space requirements - which have to be purchased in memory.
because if you just want to run your database in-memory you can already do that with most databasees.
because you don't have the same speed requirements for every piece of data in your database. You might have some tables used for session & user management that are often read & written to and must be very fast. But other tables that just hold seldom-accessed historical data. A modern database would allow you to keep the small & fast tables effectively in memory, and the huge 100 gb history table on disk. And you don't have to buy 100 gbytes of memory to do it.
because...it's just a bad idea.
Prevayler exposed.
I live in the middle. Im a DBA Architect, which means I both design and build the databases our company uses. Add to that, we're a small company, and we design very specialized software in a way that not many people can do so I also wear the hat of C# coder. I understand both sides of this fence, and have actually been in the odd position of fighting for both points of view. A good DBA is responsible for all of the flexible information that makes a modern corp. run. Think about that. All the paper, all the reports, your payroll, everything worth owning informationwise within a company is in a database somewhere. HELL YES these guys live at corporate hq. That said, in a healthy company, the DBAs and devs are able to debate rather than fight. One particularly obstinate Peoplesoft lead dev in my past and I have become very good friends over the years through this kind of argument, so its not all bad =)
... hehe maybe.. is that corporate scrutiny of their IT staff is at an all time high! So if they really suck that bad, their days are probably numbered.
My sympathy, however, does indeed go out to the poor devs who get stuck with some tool that doesn't really understand, or even want to understand, his position as an admin. Too many people slipped into the field with dollars in their eyes in the 90s, and it's led to some truly spectacular screwups. Essentailly, in my mind, almost every single failed ERP implementation could and should be blamed on insufficient database administration, and there are LOTS of flameouts there.
The upside
--chitlenz
Imagination is the silver lining of Intelligence.