Slashdot Mirror


Beyond Relational Databases

CowboyRobot writes "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems. But modern warehousing of data results in terabytes of information that needs to be organized, and the growing prevalence of mobile devices points to the increasing need for intelligent caching on the local hardware. According to the ACM, the future of database architecture must include more modularity and configuration. Although no concrete solutions are included, the article is a good overview of the problems with modern data systems."

16 of 360 comments (clear)

  1. KISS by mcrbids · · Score: 5, Insightful

    Some of the biggest problems that "new" database designs have:

    1) Overly complex

    2) Don't scale

    3) Tied to a single platform/implementation

    4) Poor performance

    It's typical to see all four in a single try!

    SQL, on the other hand:

    1) Reasonably simple API

    2) Scales to very large databsaes

    3) Cross-platform/architecture

    4) Performs very well.

    Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
  2. Just because something is "old" by Anonymous Coward · · Score: 4, Insightful

    Doesn't make it obsolete. "Databases are old and kludgey. Teh suXX0rs for R0xxng H4XX0rs liek me.

    Just because people are too stupid to take the time to read and understand the theory and learn the application doesn't mean the technology is no longer relevant.

    Of course no solutions are proposed. There are none because relational theory is correct, and appropriate for real database driven applications. Little crap bulletin boards can use MySQL.

    Netcraft confirms relational databases are dead!

  3. Author's conclusion in case of slashdotting by Anonymous Coward · · Score: 5, Funny

    The future will not be found in the relational model, object model, or hybrid, but in the comma-delimited list.

  4. Re:I did not RTFA by AKAImBatman · · Score: 5, Insightful

    I didn't RTFA but for my needs

    Or the summary

    mySQL suits me quite well.

    That's nice. It won't handle a multi-terabyte database, though. That's the domain of Terabase, Oracle, and (blech) DB2. It's also what the article is about.

    The power of PHP and mySQL is all I need.

    And a moped is all you need to get to work. If you want to haul 300 metric tons of rock from point A to point B, you need a dump truck. Again, that's what this article is about.

    Back on topic, this entire article is mostly speculative for the moment. A lot of excellent work has been done in OODB and XMLDB designs, but no singular design has yet emerged to solve all our woes. For example, I love the Prevayler concept. It solves a lot of problems, lowers data access times, and provides for complete data security. It also isn't usable or scalable without a lot more design work.

    The future will hold some very interesting things, but for now we'll have to keep inventing until we come up with a consolidated solution.

  5. SQL Dying...film at 11...NOT! by Gorm+the+DBA · · Score: 4, Interesting
    Wow..."SQL and Relational Databases to be replaced by new technology"...film at 11.

    See "COBOL to be replaced...." for an example of just how unlikely that is...sure, the latest hip "Tres Kewl" software for business might be written in something else, but SQL will be around for a long, long time.

    Consider just the fact that "Relational Database" technology as laid out by Cobb back in the early days specifically says "You don't *HAVE* to do it this way, but it will be more effecient if you do"...realize that SQL handles Denormalized Warehouse and Datamart tables just as well as it does the 5th normal form model of perfection...and relax...it ain't goin nowhere.

  6. Relational Filesystems by Doc+Ruby · · Score: 5, Interesting

    How about just getting filesystems to be relational? Replace the ancient 1960s-era hierarchical inode database that underlies filesystems with a modern relational one. Then distributed databases can provide a more consistent platform for all our distributed apps.

    Enough stuffing metadata into filenames. Enough shoehorning all data into a file/folder/cabinet model, now less familiar to people than the networked infosystems that mimic them. Enough fake hierarchies inconsistent with accurate data models, forcing whole technologies like Apple Spotlight, GNU Dashboard, and Google Search just to transact basic relatioships buried in the data. Enough reinvention of the wheel with every initial RDBMS schema, just a layer on top of the DB's actual hierarchical filesystem - a shell for an inode database. Enough empty promises of "WinFS" and "OLEDB" vapor - get relational filesystems into developers' hands, and developers will move beyond them, building apps that meet users actual needs, dragging the database tech along.

    --

    --
    make install -not war

  7. Re:I did not RTFA by techwolf · · Score: 5, Insightful

    Quite true. MySQL does very well into the gigabytes. I haven't seen any good evidence of its abilities in handling terabytes of data. Don't get me wrong, I'm a huge fan of the MySQL, but I'm a bigger fan of using the right tool for the job. For your web message board, MySQL works fine. For holding product, sales, distribution, etc. information for, say Levis, it would not.

    --
    I don't do this for karma, I do it for cash. It's much better.
  8. SQL isn't a database by Nytewynd · · Score: 5, Informative

    SQL, on the other hand:
    1) Reasonably simple API
    2) Scales to very large databsaes
    3) Cross-platform/architecture
    4) Performs very well.
    Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)


    SQL is a language for set operations. By itself it isn't a database or storage utility. There are some different versions similar to what you describe. Oracle's PL/SQL allows you to make temporary tables and materialized views. Neither solves the overall problem the article describes.

    SQL by itself doesn't perform. It is based on the database engine, and how good the developer is. I have gotten SQL queries that took minutes to exectue in seconds by adding indexes, analyzing tables, and totally rewriting inefficient code. It is only "cross-platform" if you follow the ANSI SQL standard. Each database has it's own set of handy functions that make the code database centric.

    SQL doesn't really have an API. It is a specification that is sometimes followed by database designers, and sometimes ignored. For example, in Oracle you can either use the ANSI joining sytax (LEFT OUTER JOIN) or use the (+) in the where clause.

    It scales to large databases only when they are designed properly. I work with 18 terabytes of data. My sql code wouldn't work so hot if the tables weren't designed correctly. Indexing, partitioning, and table structure have more to do with performance at that level than the code. The code can make a large difference too, but if the underlying structure is wrong, even the best SQL won't help you.

    --
    /. ++
  9. Re:I did not RTFA by abigor · · Score: 5, Funny

    Yeah, for those terabytes of data taken up by your mom's recipes and your cd collection, the extreme power of PHP and MySql is all you need, man.

  10. Why 'Beyond'? by Anonymous Coward · · Score: 5, Insightful

    Designed in the 1970s, the RDBMS has nevertheless proven to be the cornerstone of Web development three decades later. Thanks to systems like MySQL deployments are surely at record levels.

    Essayist Clay Shirky has gone to far as to suggest that MySQL is at the center of a whole new software movement.

    In my experience with Web applicaions the chief problem with the RDBMS seems to be that it does not do text indexing and search very well, so I have to keep a second store of data in something like Lucene.

    The other major problem is the level of skill required to tune the database to achieve high-performance SQL queries, so hopefully the RDBMS will evolve with more self-configuration capability.

    The article, which I only skimmed, actually addresses these two concerns but seems to pooh-pooh the notion of simply refining the existing RDBMS systems. Instead it says " Old-style database systems solve old-style problems; we need new-style databases to solve new-style problems. "

    The paper seems awfully squishy on what this means. The clearest I found was a call to "produce a storage engine that is more configurable so that it can be tuned to the requirements of individual applications."

    But this call for new highly modular/configurable storage "engines" seems to me to require at least as much fussy care and feeding as a traditional RDBMS. You're just replacing one DBA with another. And throwing out decades of refinement in the process.

    The raison d'etre of the RDBMS is to allow the programmer to treat storage as a black box while gaining nifty ACID features. Extending this to text indexing seems logical.

  11. Improving the efficiency???? by Dammital · · Score: 5, Informative
    "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems.
    Huh? Go back and reread some of Codd's papers (in the late 60's, BTW) and you'll see that efficiency was never a motivator. Simplicity was his aim, filesystem details were made irrrelevant, explicit navigation was obsoleted, and a built-in security model was included.

    When relational systems finally began to appear (and I'm thinking specifically about IBM's System R) they were dog slow, and the extant hierarchical and CODASYL network databases of the day ran rings around them. Still do, unless you throw lots of hardware at the RDBMS.

    RDBMS have lots of advantages over older technologies, but performance is not among them.

  12. Re:KISS (I can prove SQL will be around) by gosand · · Score: 5, Funny
    SQL, on the other hand:
    1) Reasonably simple API
    2) Scales to very large databsaes
    3) Cross-platform/architecture
    4) Performs very well.

    I am proof that SQL will be around for a while. When I first saw Unix back in the late 80s, I thought "this is too hard to use, why would anyone need this?" I have been a Unix/Linux user since about '92.

    When I took my first SQL class, I thought "these queries are very cumbersome. SQL is stupid." I still use it today.

    In '93 I heard about this thing called the World Wide Web, and thought "This is unnecessary. I can find whatever I need on gopher and ftp sites. Why would I want a gui thrown on top of it?"

    As you can see, I am quite the visionary.

    --

    My beliefs do not require that you agree with them.

  13. Re:Multi-Terabyte Data Warehouse and MySQL by AKAImBatman · · Score: 4, Interesting

    I have no doubt that terabytes could be stored in MySQL. My overall point is that MySQL is not designed to effectively manage that much data. For example, the presentation you link to shows that Terabase is the workhorse of the business. Data is then offloaded to a disposable MySQL database for data warehousing analysis. The database is then purged after one week.

    The holy grail of information technology would be to eliminate the need for such cumbersome replications, and instead have a single, reliable data source that can be queried for any information needed at any time. Unfortunately, MySQL isn't it. ;-)

  14. How about.... by plopez · · Score: 4, Informative

    really implementing a relational model to begin with? Then we can decide if the relational model is broken or just the vendor implementation.

    How about... a query language that is fully set operations compliant, i.e., something other than ANSI SQL which is a strange mixture of set and bag operations, and a mixture of relational algebra and relational calculas and some other 'extensions'.

    How about... realizing that a major design goal for the relational model was data integrity. Modularity and configurability are also good goals but if you are serious about your data, integrity will be at the top of the list.

    The biggest problems I see with databases is very few people understand how to use them. Here's a few tips:
    1) a table is *not* a class or an object. Tables + constraints + user defined types + constraints etc. when used properly can define domains which are close to classes and objects.

    2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule. As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

    3) Point 2 is often the consequence of mindlessly slurping in spread sheets or MS Access database tables. Anyone doing this has no business being within 50 feet of an IDE.

    4) Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

    5) Have multiple channels for data, transaction logs, large indices and O/S or user applications to reduce bottle necks. This is expensive but for large databases going cheap will hurt you.

    6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble. Do not be afraid of learning about the technology you are using. After all, technology is what you are good at, right?

    7) If it is a read only database, turn off logging for speed (impossible to due under SQL Server 2000 btw). Also, if a table is on a purge and load paradigm (many reporting and/or datawarehouse tables are) turn off logging on the table level if your version of database engine allows you to do so. Likewise, turning off logging on a hand held or other single user system may be appropriate, just make sure two people do not try to use the database at the same time.

    8) Avoid XML. Too much bloat.

    9) Learn how to use indices on tables.

    10) Learn how to read a perfomance monitor/top etc.

    Postgresql is both working hard to become truly relational AND is adding support for geographic objects and objects. The MySQL crew is working hard to improve. Oracle has some nice perfomance features but I think their 'Object/Relational' implementation is broken. SQL Server is getting 'long in the tooth'. There is also a great need for temporal databases and lightwieght engines. But remember, there is no 'silver bullet', no short cuts. Just hard work to be done.

    --
    putting the 'B' in LGBTQ+
  15. Re:A thought on XML documents by The+Slashdolt · · Score: 4, Insightful

    Here is the problem with your idea. Unlike the relational model, XML does not link facts. XML documents can be joined in any way, either valid or invalid, without you knowing one way or another. The relationships between documents are weak. There is no referential integrity. Within a proper relational model you are stating facts and factual relationships. Joins of those facts generate derived facts that are as true and accurate as your original model. Why add the overhead and complexity of xml? Why not just use a proper relational model?

    --
    mp3's are only for those with bad memories
  16. Hmm .. by ghakko · · Score: 5, Informative

    Has anyone noticed that the author of the article is from Sleepycat (which sells commercial licenses for Berkeley DB to embedded systems developers)?

    She puts forth a case against SQL and relational databases in general and claims that many applications (like directory services and search engines) have read-heavy, hierarchial access patterns which favour lighter-weight, non-relational, transaction-optional databases.

    And .. it just so happens that Sleepycat's flagship products are Berkeley DB (a flat-file database) and DBXML (an XQuery engine built on top of that).