Is the One-Size-Fits-All Database Dead?
jlbrown writes "In a new benchmarking paper, MIT professor Mike Stonebraker and colleagues demonstrate that specialized databases can have dramatic performance advantages over traditional databases (PDF) in four areas: text processing, data warehousing, stream processing, and scientific and intelligence applications. The advantage can be a factor of 10 or higher. The paper includes some interesting 'apples to apples' performance comparisons between commercial implementations of specialized architectures and relational databases in two areas: data warehousing and stream processing." From the paper: "A single code line will succeed whenever the intended customer base is reasonably uniform in their feature and query requirements. One can easily argue this uniformity for business data processing. However, in the last quarter century, a collection of new markets with new requirements has arisen. In addition, the relentless advance of technology has a tendency to change the optimization tactics from time to time."
It's natural to look at the edges of any feature or performance envelope. People that want to store petabytes of particle accellerator data, do complex queries to serve a million webpages a second, have hundreds of thousands of employees doing concurrent things to the backend.
But for most uses of databases - or any back-end processing - performance just isn't a factor and haven't been for years. Enron may have needed a huge data warehouse system; "Icepick Johhny's Bail Bonds and Securities Management" does not. Amazon needs the cutting edge in customer management; "Betty's Healing Crystals Online Shop (Now With 30% More Karma!)" not so much.
For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.
Trust the Computer. The Computer is your friend.
How did Perl & CSV fare?
It failed the "relational" part of the test. But it failed very quickly.
Who thinks that a specialized application (or algorithm) won't beat a generalized one in just about every case?
The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly -- programmer time is expensive, and hardware is cheap.
See also: high level compiled languages versus assembly language*.
(*and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.)
Sometimes it's best to just let stupid people be stupid.
Looks interesting, will check it out. Working URL for the lazy: http://datadraw.sourceforge.net/
I've made some similar discoveries myself!
Who woulda thought that specific-use items might improve the outcome of specific situations?
$nice = $webHosting + $domainNames + $sslCerts