Beyond Relational Databases

KISS by mcrbids · 2005-05-24 07:14 · Score: 5, Insightful

Some of the biggest problems that "new" database designs have:

1) Overly complex

2) Don't scale

3) Tied to a single platform/implementation

4) Poor performance

It's typical to see all four in a single try!

SQL, on the other hand:

1) Reasonably simple API

2) Scales to very large databsaes

3) Cross-platform/architecture

4) Performs very well.

Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Re:KISS by NoOneInParticular · 2005-05-24 07:51 · Score: 3, Insightful

Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.
As for SQL I do not agree with point 4: SQL does not perform very well. We're in the age of Ghz processors, fast disk drives and it *still* is an performance issue to add a few million records to a database? What SQL sorely lacks is a recognition that the 4th generation of software languages was a bad mistake, and get back to a third generation language: explicit indices, explicit loops over these indices and fast (compiled) execution of said loops. Just freaking program the database instead of waving chicken bones at it.
Re:KISS by MSBob · 2005-05-24 08:00 · Score: 3, Insightful

Relational algebra is a very nice and tidy concept but SQL is a piss poor (and limited) implementation thereof.
In other words, relational databases are very nice and elegant but their interface (SQL) is bad and should be replaced.

Also relational databases by themselves can't supply the needs of a typical enterprise and that's where technologies such as OLAP are built on top of RDBMS to make certain data manipulations efficient.

--
Your pizza just the way you ought to have it.
Re:KISS by WaterBreath · 2005-05-24 08:57 · Score: 3, Insightful

Usually a piss-poor database reflects on piss-poor developers.
This is a BIG IMPORTANT POINT. But, unfortunately, the people that are architecting the databases in these cases usually don't realize that they're producing bad designs. All they see is "The database is slow. Why the *!?$ is it so slow!?" There's usually one or two answers: 1) Your schema is organized poorly. 2) You're just plain juggling too much data at once. And let's not forget the deadly combination of both 1 and 2. 100M rows, for example, is a lot of data, any way you look at it. If you've got 100 bytes per row (remarkably small in many cases), that's 10G of data. If you're shuffling through all that data every time, yeah, it's gonna be slow. Ya might want to look into reorganizing, or archiving.
Re:KISS by waveman · 2005-05-24 09:08 · Score: 2, Insightful

> SQL, on the other hand:
> 1) Reasonably simple API...

The outrageous success of SQL is in part because it complies with the "second law of axiomatic design": a good design has minimum entropy. Prior to SQL a lot of application knowledge was duplicated in the database which increased entropy. Also a lot of knowledge of the physical structure of the data was embedded in the programs which also increased entropy. This is bad because the more redundant information the system contains the harder it is to change.

With SQL you can change the physical structure without rewriting all your programs. And you can change your code without affecting the DBMS design eg you can get the rows back in a different order. In the bad old days if you wanted things in a different order you had to reload the database and recompile all your programs.

Even though SQL uses 3 to 10 times as many CPU cycles as hierarchical DBMSs or ISAM files it took over because of this flexibility.

Similarly, I think that one of the reasons OO programming has not delivered the productivity benefits it was supposed to is that OO somehow encourages people to embed a lot of duplicated knowledge of the application into the class structure. This duplicated information makes it hard to change the system. I have seen this again and again. In contrast, before OO a lot of this information ended up in data or in a database, where it could be easily changed.
Re:KISS by Brian_Ellenberger · 2005-05-24 10:20 · Score: 3, Insightful

Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.

Google is a very unique case. There are two things in Google's advantage that most RDBMS system have to take into account:

1. Google does not have to update in realtime. If I add a page to my website it is not immediately available on Google. Contrast this to a normal RDBMS where if I add a record it must be available immediately. Google is much more similar to a Data Warehouse than a RDBMS.

2. Websites indexes are more easily parallelized because complex joining of data is not needed. Naively, Google can store all websites starting with 'A' on a server, 'B' on a server, etc. You can store the User table and the Address table on separate database servers and expect to query on users and their addresses with any sort of performance.

3. Going along with 3, the queries expected out of most RDBMS systems are much much more complex than any queries expected out of Google right now. I'm assuming you haven't seen any complex financial reports or statistics that have been generated from a RDBMS. Databases do alot more than "select name from user where id=1234".

Brian
Re:KISS by isomeme · 2005-05-24 10:30 · Score: 2, Insightful

Google's web index and desktop search facility is a database. I don't know about point 1, but Google definitely blows any relational database out of the water on point 2 to 4.

Google's performance and value are both amazing. But it's easy to drive yourself nuts (if you're an enterprise software architect, which I am) trying to get that kind of performance out of other types of application. You see, Google has two major advantages over nearly all other large data-backed applications:

1. There is no "right" answer

Google keeps their ranking and indexing schemes proprietary, so nobody can say what "should" come back from a given search. Indeed, execute the same query at the same time from different machines, or the same machine at different times, and you sometimes get wildly varying results. SEO folks call this the "Google Dance".

2. Writes are asynchronous with reads

As near as anyone can tell, the Google index is rebuilt by their crawler over the span of a few weeks, and then the whole new index is exported to production machines over the span of a few days. Only the crawler writes data to the index, and the index as it's being built is not read by end-user clients; the production index is *only* read.

These advantages let Google use a distributed, loosely coupled, inconsistent server farm made of cheap boxes that needn't be in sync with one another. It doesn't matter if queries to two of them give different answers, after all.

Contrast this with (e.g.) an online bookstore; once you order a book, every node in every part of the system needs to know about that order in order to keep everything consistent (stock level monitoring, end-user purchase tracking, and so forth). This is a much, much harder problem to solve.

So again, not to take anything away from Google's tech (which I more or less worship), but it's not a fair point of comparison for most large enterprise apps.

--
When all you have is a hammer, everything looks like a skull.

Just because something is "old" by Anonymous Coward · 2005-05-24 07:17 · Score: 4, Insightful

Doesn't make it obsolete. "Databases are old and kludgey. Teh suXX0rs for R0xxng H4XX0rs liek me.

Just because people are too stupid to take the time to read and understand the theory and learn the application doesn't mean the technology is no longer relevant.

Of course no solutions are proposed. There are none because relational theory is correct, and appropriate for real database driven applications. Little crap bulletin boards can use MySQL.

Netcraft confirms relational databases are dead!

Re:Just because something is "old" by Frank+T.+Lofaro+Jr. · 2005-05-24 08:06 · Score: 2, Insightful

Little crap bulletin boards can use MySQL.

Like Slashdot :) Remember yesterday's fiasco where posts were migrating into other articles and the time before that where it happened and the time before that. :)

Why people don't use PostgreSQL is beyond me - unless they don't like fact it is under a BSD license instead of the GPL (please don't get me started on that).

--
Just because it CAN be done, doesn't mean it should!
Re:Just because something is "old" by darthium · 2005-05-24 08:07 · Score: 2, Insightful

I concur with this opinion, I've seen a lot of supposedly 'Top notch' designs and I've seen flaws that made me blush myself....mostly the problem is with stupid design mistakes and poor concepts.

Instead of blaming the technology and tools, they should improve skills in the Sytem Artchitects and all the way down the road the people involved in Software Development.

Wake me up when it's ready by Vile+Slime · 2005-05-24 07:17 · Score: 3, Insightful

People,

Have been crying for the need to replace relational databases since the early nineties at least.

We can all see where that got them.

--
---- Go ahead, mod me down, I'll just post it again and you lose your mod points.

. . . no concrete solutions are included. . . by kfg · 2005-05-24 07:18 · Score: 3, Insightful

Funny how they never are, eh?

KFG

Re:I did not RTFA by AKAImBatman · 2005-05-24 07:19 · Score: 5, Insightful

I didn't RTFA but for my needs

Or the summary

mySQL suits me quite well.

That's nice. It won't handle a multi-terabyte database, though. That's the domain of Terabase, Oracle, and (blech) DB2. It's also what the article is about.

The power of PHP and mySQL is all I need.

And a moped is all you need to get to work. If you want to haul 300 metric tons of rock from point A to point B, you need a dump truck. Again, that's what this article is about.

Back on topic, this entire article is mostly speculative for the moment. A lot of excellent work has been done in OODB and XMLDB designs, but no singular design has yet emerged to solve all our woes. For example, I love the Prevayler concept. It solves a lot of problems, lowers data access times, and provides for complete data security. It also isn't usable or scalable without a lot more design work.

The future will hold some very interesting things, but for now we'll have to keep inventing until we come up with a consolidated solution.

--
Javascript + Nintendo DSi = DSiCade

Re:Databases? Bah! by techwolf · 2005-05-24 07:22 · Score: 2, Insightful

That's what Perl is for...

--
I don't do this for karma, I do it for cash. It's much better.

Re:I did not RTFA by techwolf · 2005-05-24 07:28 · Score: 5, Insightful

Quite true. MySQL does very well into the gigabytes. I haven't seen any good evidence of its abilities in handling terabytes of data. Don't get me wrong, I'm a huge fan of the MySQL, but I'm a bigger fan of using the right tool for the job. For your web message board, MySQL works fine. For holding product, sales, distribution, etc. information for, say Levis, it would not.

--
I don't do this for karma, I do it for cash. It's much better.

Re:SQL Dying...film at 11...NOT! by tyler_larson · 2005-05-24 07:30 · Score: 2, Insightful

After extensive research, our Databases Of The Future department has come up with the following prediction:

In the future, databases will be more "good."

While we can't yet go into detail as to how this will all work, suffice it to say that we have a pretty solid idea what the future holds.

--
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925

Why 'Beyond'? by Anonymous Coward · 2005-05-24 07:34 · Score: 5, Insightful

Designed in the 1970s, the RDBMS has nevertheless proven to be the cornerstone of Web development three decades later. Thanks to systems like MySQL deployments are surely at record levels.

Essayist Clay Shirky has gone to far as to suggest that MySQL is at the center of a whole new software movement.

In my experience with Web applicaions the chief problem with the RDBMS seems to be that it does not do text indexing and search very well, so I have to keep a second store of data in something like Lucene.

The other major problem is the level of skill required to tune the database to achieve high-performance SQL queries, so hopefully the RDBMS will evolve with more self-configuration capability.

The article, which I only skimmed, actually addresses these two concerns but seems to pooh-pooh the notion of simply refining the existing RDBMS systems. Instead it says " Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems. "

The paper seems awfully squishy on what this means. The clearest I found was a call to "produce a storage engine that is more configurable so that it can be tuned to the requirements of individual applications."

But this call for new highly modular/configurable storage "engines" seems to me to require at least as much fussy care and feeding as a traditional RDBMS. You're just replacing one DBA with another. And throwing out decades of refinement in the process.

The raison d'etre of the RDBMS is to allow the programmer to treat storage as a black box while gaining nifty ACID features. Extending this to text indexing seems logical.

Kind of OT SQL Question by Over_and_Done · 2005-05-24 07:35 · Score: 2, Insightful

I agree with the sentiments of the posters that SQL is not going anywhere, but I had a question.
As I am designing more and more complex web apps, I am constantly having to think of new, innovative ways to design the tables and databases and am currently making it up as I go. Does anyone have a reccomendation for books/sites that talk about good design proactices, that is not "How to use SQL" and relatively agnostic on the specific brand on DB?
Sorry for the OT post, its just something that has been bugging me for a while

the usual database blatherings by Anonymous Coward · 2005-05-24 07:49 · Score: 2, Insightful

by MARGO SELTZER, SLEEPYCAT

Sleepycat? The guys who make a brain-dead key/value database with no data manipulation or integrity capabilities? Who are they to educate others on the topic of relational databases? (Sleepycat's products are useful tools, but they are not true databases).

while data management has become almost synonymous with RDBMS, however, there are an increasing number of applications for which lighter-weight alternatives are more appropriate.

Ahh, so the proper title of this paper should be: "Beneath Relational Databases" or "Below Relational Databases". Because the relational model is a *complete* model for data storage and manipulation, so if you have a subset of this functionality, you are not "beyond" it.

As argued by Stonebraker, the relational vendors have been providing the illusion that an RDBMS is the answer to any data management need. For example, as data warehousing and decision support have emerged as important application domains, the vendors have adapted products to address the specialized needs that arise in these new domains. They do this by hiding fairly different data management implementations behind the familiar SQL front end. This model breaks down, however, as one begins to examine emerging data needs in more depth.

Well, the mention of Stonebraker's name as an authority on databases is generally an indicater of a content-free paper, but let's be sure we're talking about the same thing: the relational *model* is a *complete* model. There is no other more effective model, in fact as far as I know, there are no other complete models!

So if you want to use the relational model as a foundation to build new database products, go right ahead. If you're talking the same old vendor BS about "post relational" or "XML" (hierarchical) or "object" (network and/or hierarchical), then please shut up!!

My feeling is when he says "in depth", he means "less depth".

As more documents are created, transmitted, and operated in XML, these translations become unnecessary, inefficient, and tedious. Surely there must be a better way. Native XML data stores with XQuery and XPath access patterns represent the next wave of storage evolution. While new items are constantly added to and removed from an XML repository, the documents themselves are largely read-only.

Uh, yes there is a better way: create an XML data type in a relational database with a full set of XML operators. The relational model doesn't care about data types.

I have no interest in giving up the general relational model for a hierarchic model (rejected decades ago as not being general enough) based on a TEXT FILE FORMAT.

Stream processing is a bit of an outcast in this laundry list of data-intensive applications.

I smell Stonebraker.. yes, it's an outcast because stream processing has nothing to do with data storage!!!

Some argue that database architecture is in need of a revolution akin to the RISC revolution in computer hardware

Yes, all these people need to study and understand the relational model which was developed 30 years ago and is still the only complete data model. The relational model can be described in half a page, and consists of a small number of core operations from which any possible data storage and manipulation need can be developed. Stop thinking about implementations, think about the *model* and then use that develop new implementations!!

Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems.

What does this mean exactly? I need to store and manipulate data without limitations. The relational model offers this. What is "old" or "new" here? I'm not going to switch to an ad-hoc subset of the relational model because it's "new".

This "paper" (wasn't there one a couple weeks from some Microsoft dude, which was equally useless?) commits the same old sins: 1) look at existin

Re:Relational Filesystems by AKAImBatman · 2005-05-24 07:57 · Score: 2, Insightful

We've already got RDBMS tech - why reinvent an inadequate version of it?

Because current RDBMS designs are unsuitable for filesystems. Relational theory still holds (just as it does for OODBs), but the physical design should be quite different if it's going to be effecient.

As I said, this has been beaten to death in the research communities. BeOS even included a DBFS design, but it went largely unused. NTFS also has all the necessary stuff in it, but Microsoft constantly removes it in final releases. ReiserFS has DBFS features, but these also go largely unused.

I think the problem is that making effective use of a DBFS requires a very different set of applications. i.e. If the applications are aware of the functionality, then they can assist the user and provide useful support. But without this form of OS and application support, the user will find that the metadata is nothing but added confusion.

--
Javascript + Nintendo DSi = DSiCade

Re:AS-400 by Monkelectric · 2005-05-24 08:04 · Score: 2, Insightful

As/400's are great for accounting and manufacturing. They are *NOT* used by IT professionals or Scientists and are only used in accounting (primarily gambling -- to monitor slot machines) and manufacturing.

AS/400's are hopelessly complex even to seasoned IT professionals such as myself, and they're only around is not because people like them but because the work SO GOD DAMNED WELL :)

point being, tractors work well to but you dont drive one day to day do you? :)

--

Religion is a gateway psychosis. -- Dave Foley

Re:A thought on XML documents by The+Slashdolt · 2005-05-24 08:29 · Score: 4, Insightful

Here is the problem with your idea. Unlike the relational model, XML does not link facts. XML documents can be joined in any way, either valid or invalid, without you knowing one way or another. The relationships between documents are weak. There is no referential integrity. Within a proper relational model you are stating facts and factual relationships. Joins of those facts generate derived facts that are as true and accurate as your original model. Why add the overhead and complexity of xml? Why not just use a proper relational model?

--
mp3's are only for those with bad memories

Re:How about.... by Anonymous Coward · 2005-05-24 08:39 · Score: 2, Insightful

1) a table is *not* a class or an object.

Good advice. A table is simply a snapshot of a relational value. True relational values have no top/bottom or left/right ordering so you can't really show them on paper or on the screen. It doesn't make sense to map classes or objects to tables.

2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule.

Normalization has nothing to do with integrity per se. You can add constraints to a denormalized database that make it logically equivalent to a normalized one. I'm not aware of any "once and only once" rule in relational theory.

Normalization is about dependencies: your data shouldn't have certain types of dependencies in it.

As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

Wrong, the number of fields has nothing to do with normalization. And you should review your data model even if it has 1 field.

Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

Yes, 0+1 is faster, but lots of RAM is even faster. However this is a physical detail and not anything to do with the relational model.

6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble.

This should be point #1 with a double underline. The lack of understanding and even hostility toward theory is frightening in the IT industry. If the author of the paper above had any foundation knowledge her paper would be much different.

8) Avoid XML. Too much bloat.

Just remember that XML is FILE FORMAT like Jpeg, CSV, and you'll do fine.

9) Learn how to use indices on tables.

Christ, I hope people know this one! They should also realize that an index is a PHYSICAL manifestion of a logical idea: keys.

Re:5th normal form model of perfection. by Frank+T.+Lofaro+Jr. · 2005-05-24 08:45 · Score: 2, Insightful

If full 3rd normal form is too slow, you need better hardware.

If a few joins (assuming you've got indexes, etc setup) make it too slow you are too close to the edge.

Saving on hardware but spending more in dealing with data problems is false economy.

--
Just because it CAN be done, doesn't mean it should!

JDBC further standardizes SQL dialects by irritating+environme · 2005-05-24 09:15 · Score: 2, Insightful

JDBC (probably ODBC too, tho haven't used it in eight years) helps to standardize key generation (in JDBC 3.0 FINALLY), and Date processing (christ, date functions are so annoying). Most other operations can usually be done by the platform language that is processing the data, so you can avoid the tie-in that results from various SQL dialects' built-in functions. XML databases were a total flop, and so were object databases. I agree, SQL engines are so mature now that I don't see any database tech replacing them for another ten years. By the way, can we get PostGres (and now Oracle's) support of regular expression LIKEs standardized? And can we please get JDBC to support something like: INSERT $price INTO pricetable where productid = $productid rather than using ?'s and counting which ? to set values to? Hibernate's query language does this, and I really like it.

--

Hey, I'm just your average shit and piss factory.

XML documents are an hierarchical database by irritating+environme · 2005-05-24 09:18 · Score: 2, Insightful

An XML doc is a tree, thus it is hierarchically organized data. There have been hacks to try to extend around this limitation, but relational data still has superior flexibility.

that's why XML databases flopped

--

Hey, I'm just your average shit and piss factory.

Re:SQL isn't a database by Shadowlore · 2005-05-24 13:49 · Score: 2, Insightful

Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?

Quite a bit actually. In my experience most only support MySQL. Why? They don't understand the value of data integrity and the features MySQL lacks/lacked. As a result they have had to poorly implement things that the DB itself should do, and when they try to "port" this to another DB they run into all sorts of issues.

And SQL is a language, not an API.
So they give up.

--
My Suburban burns less gasoline than your Prius.

27 of 360 comments (clear)