Is the One-Size-Fits-All Database Dead?
jlbrown writes "In a new benchmarking paper, MIT professor Mike Stonebraker and colleagues demonstrate that specialized databases can have dramatic performance advantages over traditional databases (PDF) in four areas: text processing, data warehousing, stream processing, and scientific and intelligence applications. The advantage can be a factor of 10 or higher. The paper includes some interesting 'apples to apples' performance comparisons between commercial implementations of specialized architectures and relational databases in two areas: data warehousing and stream processing." From the paper: "A single code line will succeed whenever the intended customer base is reasonably uniform in their feature and query requirements. One can easily argue this uniformity for business data processing. However, in the last quarter century, a collection of new markets with new requirements has arisen. In addition, the relentless advance of technology has a tendency to change the optimization tactics from time to time."
1) More and more specialized databases will begin cropping up.
2) Mainstream database systems will modularize their engines so they can be optimized for different applications and they can incorporate the benefits of the specialized databases while still maintaining a single uniform database management system.
3) Someone will write a paper about how we've gone from specialized to monolithic...
4) Something else will trigger specialization... (repeat)
Dvorak if you steal this one from me I'm going to stop reading your writing... oh wait.
It's natural to look at the edges of any feature or performance envelope. People that want to store petabytes of particle accellerator data, do complex queries to serve a million webpages a second, have hundreds of thousands of employees doing concurrent things to the backend.
But for most uses of databases - or any back-end processing - performance just isn't a factor and haven't been for years. Enron may have needed a huge data warehouse system; "Icepick Johhny's Bail Bonds and Securities Management" does not. Amazon needs the cutting edge in customer management; "Betty's Healing Crystals Online Shop (Now With 30% More Karma!)" not so much.
For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.
Trust the Computer. The Computer is your friend.
steve
(+1 Sarcastic)
Oh, you're not stuck, you're just unable to let go of the onion rings.
I was just thinking about writing an article on the same issue.
The problem I've noticed is that too many applications are becoming specialized in ways that are not handled well by traditional databases. The key example of this is forum software. Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system. Yet we keep trying to cram them into SQL databases, then get surprised when we're hit with performance problems and security issues. It's simply the wrong way to go about solving the problem.
As anyone with a compsci degree or equivalent experience can tell you, creating a custom database is not that hard. In the past it made sense to go with off-the-shelf databases because they were more flexible and robust. But now that modern technology is causing us to fight with the databases just to get the job done, the time saved from generic databases is starting to look like a wash. We might as well go back to custom databases (or database platforms like BerkeleyDB) for these specialized needs.
Javascript + Nintendo DSi = DSiCade
Who thinks that a specialized application (or algorithm) won't beat a generalized one in just about every case?
The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly -- programmer time is expensive, and hardware is cheap.
See also: high level compiled languages versus assembly language*.
(*and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.)
Sometimes it's best to just let stupid people be stupid.
We're all sick with "new fad: X is dead?" articles. Please reduce lameness to an acceptable level!
Can't we get used to the fact that specialized & new solutions don't magically kill existing popular solution to a problem?
And it's not a recent phenomenon, either, I bet it goes back to when the first proto-journalistic phenomenons formed in early uhman societies, and haunts us to this very day...
"Letters! Spoken speech dead?"
"Bicycles! Walking on foot dead?"
"Trains! Bicycles dead?"
"Cars! Trains dead?"
"Aeroplanes! Trains maybe dead again this time?"
"Computers! Brains dead?"
"Monitors! Printing dead yet?"
"Databases! File systems dead?"
"Specialized databases! Generic databases dead?"
In a nutshell. Don't forget that a database is a very specialized form of a storage system, you can think of it as a very special sort of file system. It didn't kill file systems (as noted above), so specialized systems will thrive just as well without killing anything.
Anyhow, I *wish* file systems were dead. They have grown into messy trees that are unfixable because trees can only handle about 3 or 4 factors and then you either have to duplicate information (repeat factors), or play messy games, or both.
You know, I've seen my share of RDBMS designs to know the "messiness" is not the fault of the file systems (or databases in that regard).
Sets have more issues than you describe, and you know very well Vista had lots of set based features that were later downscaled, hidden and reduced, not because WinFS was dropped (because the sets in Vista don't use WinFS, they work with indexing too), but because it was terribly confusing to the users.
I don't think that you know Oracle very well. Lets say you want so scale and so you want clustering or grid functionality -- built into Oracle. Lets say that you want to partition your enormous table into one physical table per month or quarter -- built in. Oh, and if you query the whole giant table you'd like parallel processes to run against each partition, balanced across your cluster or grid -- yeah, that's built in too. Lets say you almost always get a group of data together rather than piece by piece so you want it physically colocated to reduce disk i/o -- built in.
This is why you pay a good wage for your Oracle data architect & DBA -- so that you can get people who know how to do these sort of things when needed. And honestly I'm not even scratching the surface.
Consider a data warehouse for a giant telecom in South Africa (with a DBA named Billy in case you wondered). You have over a billion rows in your main fact table, but you're only interested in a few thousand of those rows. You have an index on dates and another index on geographic region and another region on customer. Any one of those indexes will reduce the 1.1 billion rows to 10's of millions of rows, but all three restrictions will reduce it to a few thousand. What if you could read three indexes, perform bitmap comparisons on the results to get only the rows that match the results of all three indexes and then only fetch those few thousand rows from the 1.1 billion row table. Yup, that's built in and Oracle does it for you for behind the scenes.
Now yeah, you can build a faster single-purpose db. But you better have a god damn'd lot of dev hours allocated to the task. My bet is that you'll probably come our way ahead in cash & time to market with Oracle, a good data architect and a good DBA. Any time you want to put your money on the line, you let me know.
As any English teacher will tell you, any language that will support great poetry and prose will also make it possible to write the most gawdawful cr*p. Perl bestows great powers, but the perl user must temper his cleverness with wisdom if he is to truly master his craft.
However in this specific case Google reveals that
was simply "borrowed" from y-combinator.pl. This is an instance of Perl being used in a self-referential manner to add a new capability (the Y combinator allows recursion of anonymous subroutines (why anyone would bother to do such an arcane thing comes back to the English teacher's remarks)). Self-referential statements are always difficult to understand because, well, they just are that way (including this one).Outside of "Golfing", I'd strongly disagree. I don't think the community encourages it for the most.
This is from someone who's spent the last seven years with Perl and in the community. YMMV
-William Shatner can be neither created nor destroyed.